Sorry, I accidentally posted this in the other proposal thread first.
The model comparison table has BGE-small-en-v1.5 at 256 tokens and all-MiniLM-L6-v2 at 512. I think those are swapped. Since the context window drives your chunking decisions, does the model choice change if the specs are reversed?
sqlite3 works on desktop but isn't available on Joplin mobile (at the moment). Is mobile out of scope, or have you thought about a storage path that works on both?
The proposal covers incremental re-embedding via the Events API, but what happens to the clusters when a user creates a new note? Does the whole UMAP + K-Means pipeline re-run, or is there a lighter path?
Have you tested the clustering pipeline on a real note collection? Curious what the clusters looked like.