GSoC 2026 Proposal Draft – Idea 3: AI-based-Note-categorization

shikuz · 29 March 2026 10:54

Sorry, I accidentally posted this in the other proposal thread first.

The model comparison table has BGE-small-en-v1.5 at 256 tokens and all-MiniLM-L6-v2 at 512. I think those are swapped. Since the context window drives your chunking decisions, does the model choice change if the specs are reversed?

sqlite3 works on desktop but isn't available on Joplin mobile (at the moment). Is mobile out of scope, or have you thought about a storage path that works on both?

The proposal covers incremental re-embedding via the Events API, but what happens to the clusters when a user creates a new note? Does the whole UMAP + K-Means pipeline re-run, or is there a lighter path?

Have you tested the clustering pipeline on a real note collection? Curious what the clusters looked like.

Topic		Replies	Views
GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI GSoC	0	19	31 March 2026
GSoC 2026: Opportunities for the AI projects GSoC	32	695	13 April 2026
AI Note Clustering BenchMark Tessting via Plugin GSoC	0	26	29 March 2026
Plugin: Semantically Similar Notes (beta) Plugins	30	2657	5 February 2024
Welcome to GSoC 2026 with Joplin! GSoC	155	1930	1 April 2026

GSoC 2026 Proposal Draft – Idea 3: AI-based-Note-categorization

Related topics