Question regarding GSoC 2026

OverTinker0827 · 20 March 2026 19:21

Hello all, I am interested in several ideas from the ideas list. Can I submit proposals for multiple ideas, and if yes, can I submit them in one proposal or do i need to make multiple? Thank you

laurent · 20 March 2026 19:56

Hello, yes you can have multiple proposal, and they should all be in separate posts

OverTinker0827 · 20 March 2026 20:54

Ok, but for the final GSoC proposal submission, does it need to be different submissions or just one submission?

laurent · 20 March 2026 22:32

It should be different ones

OverTinker0827 · 21 March 2026 04:38

Ok, thank you.

Divya-A10 · 22 March 2026 13:57

Hi @shikuz, @HahaBill, and @malekhavasi,

I'm Divya, applying for Idea 4: Chat with your note collection using AI. I've been working through the proposal and have a couple of design questions I'd love your input on before I finalise.

Chunking strategy: For Joplin's note format which tends to be short, informal, and mixed-content I'm leaning toward sentence-level chunking with a semantic merge step (grouping sentences within the same semantic unit) rather than fixed-size chunking. Has anyone explored this in the context of Joplin notes or is there a preferred approach the mentors have in mind?

Embedding model:I'm evaluating all-MiniLM-L6-v2 (fast, local, 80MB) as the default with nomic-embed-text as the Ollama alternative. Given that Joplin already supports local model execution, does the team have a preference for keeping the embedding stack consistent with the existing AI plugin patterns?Happy to discuss further as these decisions will shape the core architecture,so I want to get them right early.

Thanks!
Divya
GitHub: Divya-A10

shikuz · 25 March 2026 13:10

Hey @Divya-A10, on the embedding model - Xenova/bge-small-en-v1.5 is worth considering over all-MiniLM (better retrieval benchmarks at similar size). The AI summarisation plugin by @HahaBill is a good reference for how Transformers.js runs inside a Joplin plugin.

On chunking - Joplin collections vary a lot, the same user might have short fleeting notes alongside long structured documents or web clippings. What similarity threshold are you thinking for the merge step, and how does the approach behave across that range?

For the broader question on what direction mentors have in mind, the AI scoping discussion is worth reading before finalising a proposal.

Feel free to open your own thread for proposal discussion - the submission template has the structure to follow.

Divya-A10 · 25 March 2026 14:44

Hi @shikuz, thank you this is really helpful!

The suggestion about Xenova/bge-small-en-v1.5 makes sense, especially if it offers better retrieval performance at a similar size. I’ll look into it and also check the summarisation plugin for how Transformers.js is integrated.

For chunking, I was initially thinking of a semantic similarity threshold-based merge (e.g., grouping adjacent sentences if similarity exceeds a threshold), but I haven’t fixed a value yet. My idea was to experiment with a range and evaluate how it performs across different note types short notes vs longer structured content — since, as you mentioned, Joplin collections can vary significantly.

I’ll also go through the AI scoping discussion before finalising the proposal direction.

I’ll open a separate thread for proposal discussion shortly to keep things structured thanks again for the guidance!

Topic		Replies	Views
GSoC 2026: Opportunities for the AI projects GSoC	40	1294	19 June 2026
GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI GSoC	0	29	31 March 2026
Welcome to GSoC 2026 with Joplin! GSoC	154	2670	1 April 2026
GSoC Idea Discussion: Chat with your note collection using AI – architecture and LLM approach Development	5	161	13 March 2026
Introducing HaHaBill GSoC	11	781	1 March 2024

Question regarding GSoC 2026

Related topics