Thanks for the review @shikuz!
1. Context window table: yes, those are swapped Correct values are: BGE-small-en-v1.5 → 512-token, all-MiniLM-L6-v2 → 256-token (silently truncates).
2. Mobile: explicitly out of scope Pipeline depends on Node.js native modules (sqlite3, ONNX Runtime) unavailable in mobile sandbox. The vector store sits behind an abstraction layer though, so a future contributor could swap sqlite3 for sql.js without touching embedding or clustering logic.
3. New note: no full re-run
onNoteChange()fires → note embedded, vector saved to sqlite3 (< 1 second)- New vector compared against stored centroids → tentative assignment, no re-clustering
- Full re-analysis only triggers on manual Re-analyse click, or when 5%+ of collection has changed
4. Real collection test: Yes. Built a working Joplin plugin prototype with embedded clustering pipeline. The implementation validates the core architecture before potential production scaling.
Demo
there is a limit of 10mb video so i have uploaded the last part please see the full demo video at
data.json (100 notes)
↓
Embedding extraction (BGE-small-en-v1.5 via Transformers.js in Web Worker)
↓
Optional dimensionality reduction (UMAP: 384-dim → 5-dim for tighter separation)
↓
K-Means clustering (K=2 to adaptive max)
↓
Silhouette scoring (automatic K selection without manual inspection)
↓
Final clustering + Benchmark UI (sidebar visualization with metrics)
Repository of my clustering phase testing (with dummy data not real-time notes) ![]()
Will push the corrected proposal with the table fix shortly.
Phase # 1 System Architecture
Phase # 2 System Architecture
also want for @HahaBill to see my work ![]()

