GSoC 2026: Opportunities for the AI projects

The idea of having a shared embedding index makes sense, but the way each idea uses embeddings is a bit different. the ideas can be grouped into two types:

  1. Chunk based projects
    Idea 1 (AI Search) and Idea 4 (Chat with notes) need chunk level embeddings, since they retrieve specific parts of notes.
  2. Note based projects
    Idea 3 (Categorisation) and Idea 2 (Note graphs) mostly compare whole notes, so a single vector per note is enough. This can be created by averaging the chunk embeddings.

At the same time, the retrieval logic is not same for every idea. For example, search and chat would use similarity based retrieval (possibly with reranking and RAG), while categorisation would use clustering.