Hi everyone,
I’m interested in working on the "Chat with your note collection using AI” project for GSoC and have started thinking about a possible approach.
My idea is to build a plugin that indexes the user’s notes, creates embeddings, and stores them in a vector database. When the user asks a question, the system would retrieve relevant notes and generate an answer using an LLM (RAG-style workflow).
One thing I’m unsure about is the LLM setup. Since Joplin focuses strongly on privacy and being open source, I’m wondering what approach would be preferred:
• Using local LLMs (for example through something like Ollama) so that notes never leave the user’s machine
• Supporting external APIs like OpenAI as an optional provider
• Or some different architecture that the maintainers would prefer
My current thought was to design the system so that local models are the default, while external APIs could be optionally configured by the user.
I’d really appreciate any feedback or guidance on whether this direction makes sense for Joplin, or if there are better approaches I should consider before starting development.
Thanks!Preformatted text
This is called RAG.
Proper RAG is extremely hard to do using local only LLMs on average PC. Retrieval would be slow, chunk enrichment going to take months and $$ on electricity.
If you want vanilla RAG - standard one that uses brute force or even MD aware chunking, embedding, sparse+dense retrieval - you should know that it’s effectiveness is debatable. I’ve read it’s something like ~32% on benchmarks like FinanceBench.
I guess that’s what we already have in extremely popular plugin called “Jarvis” - definitely check it out.
Now talking about a proper RAG with MD-aware chunks, hierarchical, intra-note and inter-note linking-based contextual embeddings, sparse+dense retrieval and re-ranker - I don’t know how much time you guys are given but let me just assume it’s not enough. You can check out Github project called dsRag- it’s the only sane kind of implementation I can see that could be somehow incorporated into Joplin.
What I think is more realistic is integrating vanilla RAG into the Joplin search itself.
In case there’s a strong commitment to idea - for any questions on the topic please ping me, I’ll be happy to help.
Thanks a lot for the detailed explanation, that really helped me understand the challenges better.
I was originally thinking about building a basic RAG pipeline (chunking notes → embeddings → vector search → LLM answer). But after reading your comment, I see that doing a proper RAG system with things like hierarchical chunking and reranking would probably be much more complex than what can realistically be done in a GSoC project.
I’ll definitely check out the Jarvis plugin to see what it already does, and I’ll also take a look at the dsRag project you mentioned.
Your idea of integrating a simpler RAG approach into Joplin’s search sounds interesting. It might actually make more sense than building a completely separate chat interface.
Great idea! I have no knowledge about the technical part, but I have a remark from the sideline: Please keep the user interface simple and do not feel the need for an extra search panel or other sophisticated extra interfaces. Ignorimg the technical integration: If the RAG-search is not integrated into the search area where the „normal“ search is done, it would create a bad UX. Just a button to switch it on/off is the best starting from my point of view. Of course, all said before is just my own humble opinion…
Thanks for sharing that perspective, it’s really helpful.
I agree that keeping the interface simple is important. Integrating this into the existing search instead of adding a completely separate panel makes a lot of sense from a UX point of view. A simple toggle to enable or disable the AI/RAG search also sounds like a good starting approach.
I’ll definitely keep this in mind while thinking about the design. Thanks again for the suggestion!
Hey @Ankitsingh, welcome to the forum! Jarvis dev here.
This is a great area to work on! Since it was mentioned (thanks @executed!), I'll briefly describe what already exists in Jarvis and where I think the real opportunities are, whether this ends up built on top of Jarvis or as a separate implementation.
Offline vs. online LLMs
IMO supporting both local and API is the right call (Jarvis follows the same approach). The tricky part is defaults. Offline-first is ideal for privacy, but it requires users to set up something like Ollama before they can use the feature, which is more friction than pasting an API key. Jarvis ships a small embedding model (Universal Sentence Encoder) so that note indexing works out of the box with zero setup, but for generation it relies on either a local server or an API. Getting the out-of-the-box experience right without requiring setup is one of the harder UX problems here.
How RAG works in Jarvis
The note database is built by splitting notes along markdown headings and code blocks (not fixed-size windows). Each chunk gets decorated with metadata before embedding: the note title, the full heading path (like Note Title / Section / Subsection), and tags. This means the embedding captures not just what a block says, but where it sits in the structure of the note.
When you chat with your notes, the query is embedded and compared against all blocks by cosine similarity. Results are ranked, and the top hits are assembled into context for the LLM, respecting the token budget.
There's also a Search: command that lets you combine this with Joplin-style keyword search. This is semantic ranking with keyword filtering on top, which gives you a way to scope results when you know specific terms matter.
On top of that, you can expand retrieved blocks with surrounding context (previous/next blocks from the same note, or the nearest blocks by similarity from anywhere in your notes).
How it feels like
From my personal experience, despite the simple implementation it's surprisingly effective. The main reason, I think, is that with modern context windows (128k+ tokens), you can afford to be generous with retrieval. Stuff enough candidate blocks in there, including some false positives, and a strong LLM will sort through it. The model is basically doing implicit reranking as part of answering your question. At least for personal notes it works better than you'd expect.
Room to grow
Jarvis is still missing a few modern features. It's also not integrated into the main Joplin client.
-
Reranking: There's no dedicated reranker (a second, more precise model that re-scores the top results before sending them to the LLM). This could meaningfully improve precision, especially for larger note collections.
-
Query decomposition: For complex questions that span multiple topics or notes, breaking the query into sub-queries and retrieving separately for each would help. Right now it's single-shot retrieval.
-
Automatic hybrid scoring: Right now keyword search is a manual, binary filter. Fusing keyword scores (like BM25) with vector similarity scores automatically, so that exact term matches get a boost without the user having to think about it, would be another improvement.
-
Smarter chunking strategies: The markdown-aware splitting works well, but there's room to experiment with overlapping chunks, or dynamically sizing chunks like
dsRAGdoes. -
Relevant segment extraction (RSE), which dynamically combines adjacent relevant chunks into variable-length segments, may also be a valuable addition.
These are all known problems with well-tested solutions. I haven't had the time to get to them yet. If you're considering this for GSoC, happy to discuss approaches.