Hello all, I am interested in several ideas from the ideas list. Can I submit proposals for multiple ideas, and if yes, can I submit them in one proposal or do i need to make multiple? Thank you
Hello, yes you can have multiple proposal, and they should all be in separate posts
Ok, but for the final GSoC proposal submission, does it need to be different submissions or just one submission?
It should be different ones
Ok, thank you.
Hi @shikuz, @HahaBill, and @malekhavasi,
I'm Divya, applying for Idea 4: Chat with your note collection using AI. I've been working through the proposal and have a couple of design questions I'd love your input on before I finalise.
Chunking strategy: For Joplin's note format which tends to be short, informal, and mixed-content I'm leaning toward sentence-level chunking with a semantic merge step (grouping sentences within the same semantic unit) rather than fixed-size chunking. Has anyone explored this in the context of Joplin notes or is there a preferred approach the mentors have in mind?
Embedding model:I'm evaluating all-MiniLM-L6-v2 (fast, local, 80MB) as the default with nomic-embed-text as the Ollama alternative. Given that Joplin already supports local model execution, does the team have a preference for keeping the embedding stack consistent with the existing AI plugin patterns?Happy to discuss further as these decisions will shape the core architecture,so I want to get them right early.
Thanks!
Divya
GitHub: Divya-A10
Hey @Divya-A10, on the embedding model - Xenova/bge-small-en-v1.5 is worth considering over all-MiniLM (better retrieval benchmarks at similar size). The AI summarisation plugin by @HahaBill is a good reference for how Transformers.js runs inside a Joplin plugin.
On chunking - Joplin collections vary a lot, the same user might have short fleeting notes alongside long structured documents or web clippings. What similarity threshold are you thinking for the merge step, and how does the approach behave across that range?
For the broader question on what direction mentors have in mind, the AI scoping discussion is worth reading before finalising a proposal.
Feel free to open your own thread for proposal discussion - the submission template has the structure to follow.
Hi @shikuz, thank you this is really helpful!
The suggestion about Xenova/bge-small-en-v1.5 makes sense, especially if it offers better retrieval performance at a similar size. I’ll look into it and also check the summarisation plugin for how Transformers.js is integrated.
For chunking, I was initially thinking of a semantic similarity threshold-based merge (e.g., grouping adjacent sentences if similarity exceeds a threshold), but I haven’t fixed a value yet. My idea was to experiment with a range and evaluate how it performs across different note types short notes vs longer structured content — since, as you mentioned, Joplin collections can vary significantly.
I’ll also go through the AI scoping discussion before finalising the proposal direction.
I’ll open a separate thread for proposal discussion shortly to keep things structured thanks again for the guidance!