GSoC 2026 Proposal Draft – Idea 3: AI-based-Note-categorization

developerzohaib786 · 29 March 2026 19:04

Thanks for the review @shikuz!

1. Context window table: yes, those are swapped Correct values are: BGE-small-en-v1.5 → 512-token, all-MiniLM-L6-v2 → 256-token (silently truncates).

2. Mobile: explicitly out of scope Pipeline depends on Node.js native modules (sqlite3, ONNX Runtime) unavailable in mobile sandbox. The vector store sits behind an abstraction layer though, so a future contributor could swap sqlite3 for sql.js without touching embedding or clustering logic.

3. New note: no full re-run

onNoteChange() fires → note embedded, vector saved to sqlite3 (< 1 second)
New vector compared against stored centroids → tentative assignment, no re-clustering
Full re-analysis only triggers on manual Re-analyse click, or when 5%+ of collection has changed

4. Real collection test: Yes. Built a working Joplin plugin prototype with embedded clustering pipeline. The implementation validates the core architecture before potential production scaling.

Demo

there is a limit of 10mb video so i have uploaded the last part please see the full demo video at

data.json (100 notes) 
  ↓
Embedding extraction (BGE-small-en-v1.5 via Transformers.js in Web Worker)
  ↓
Optional dimensionality reduction (UMAP: 384-dim → 5-dim for tighter separation)
  ↓
K-Means clustering (K=2 to adaptive max)
  ↓
Silhouette scoring (automatic K selection without manual inspection)
  ↓
Final clustering + Benchmark UI (sidebar visualization with metrics)

Repository of my clustering phase testing (with dummy data not real-time notes)

Will push the corrected proposal with the table fix shortly.

Phase # 1 System Architecture

Phase # 2 System Architecture

also want for @HahaBill to see my work

Topic		Replies	Views
GSoC 2026: Opportunities for the AI projects GSoC	36	1135	20 May 2026
GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI GSoC	0	27	31 March 2026
AI Note Clustering BenchMark Tessting via Plugin GSoC	0	27	29 March 2026
Plugin: Semantically Similar Notes (beta) Plugins	30	2665	5 February 2024
About the Note Categorisation category Note Categorisation	0	30	7 May 2026

GSoC 2026 Proposal Draft – Idea 3: AI-based-Note-categorization

Demo

Phase # 1 System Architecture

Phase # 2 System Architecture

Related topics