The idea of having a shared embedding index makes sense, but the way each idea uses embeddings is a bit different. the ideas can be grouped into two types:
- Chunk based projects
Idea 1 (AI Search) and Idea 4 (Chat with notes) need chunk level embeddings, since they retrieve specific parts of notes. - Note based projects
Idea 3 (Categorisation) and Idea 2 (Note graphs) mostly compare whole notes, so a single vector per note is enough. This can be created by averaging the chunk embeddings.
At the same time, the retrieval logic is not same for every idea. For example, search and chat would use similarity based retrieval (possibly with reranking and RAG), while categorisation would use clustering.