Hey @Ankitsingh, welcome to the forum! Jarvis dev here.
This is a great area to work on! Since it was mentioned (thanks @executed!), I'll briefly describe what already exists in Jarvis and where I think the real opportunities are, whether this ends up built on top of Jarvis or as a separate implementation.
Offline vs. online LLMs
IMO supporting both local and API is the right call (Jarvis follows the same approach). The tricky part is defaults. Offline-first is ideal for privacy, but it requires users to set up something like Ollama before they can use the feature, which is more friction than pasting an API key. Jarvis ships a small embedding model (Universal Sentence Encoder) so that note indexing works out of the box with zero setup, but for generation it relies on either a local server or an API. Getting the out-of-the-box experience right without requiring setup is one of the harder UX problems here.
How RAG works in Jarvis
The note database is built by splitting notes along markdown headings and code blocks (not fixed-size windows). Each chunk gets decorated with metadata before embedding: the note title, the full heading path (like Note Title / Section / Subsection), and tags. This means the embedding captures not just what a block says, but where it sits in the structure of the note.
When you chat with your notes, the query is embedded and compared against all blocks by cosine similarity. Results are ranked, and the top hits are assembled into context for the LLM, respecting the token budget.
There's also a Search: command that lets you combine this with Joplin-style keyword search. This is semantic ranking with keyword filtering on top, which gives you a way to scope results when you know specific terms matter.
On top of that, you can expand retrieved blocks with surrounding context (previous/next blocks from the same note, or the nearest blocks by similarity from anywhere in your notes).
How it feels like
From my personal experience, despite the simple implementation it's surprisingly effective. The main reason, I think, is that with modern context windows (128k+ tokens), you can afford to be generous with retrieval. Stuff enough candidate blocks in there, including some false positives, and a strong LLM will sort through it. The model is basically doing implicit reranking as part of answering your question. At least for personal notes it works better than you'd expect.
Room to grow
Jarvis is still missing a few modern features. It's also not integrated into the main Joplin client.
-
Reranking: There's no dedicated reranker (a second, more precise model that re-scores the top results before sending them to the LLM). This could meaningfully improve precision, especially for larger note collections.
-
Query decomposition: For complex questions that span multiple topics or notes, breaking the query into sub-queries and retrieving separately for each would help. Right now it's single-shot retrieval.
-
Automatic hybrid scoring: Right now keyword search is a manual, binary filter. Fusing keyword scores (like BM25) with vector similarity scores automatically, so that exact term matches get a boost without the user having to think about it, would be another improvement.
-
Smarter chunking strategies: The markdown-aware splitting works well, but there's room to experiment with overlapping chunks, or dynamically sizing chunks like
dsRAGdoes. -
Relevant segment extraction (RSE), which dynamically combines adjacent relevant chunks into variable-length segments, may also be a valuable addition.
These are all known problems with well-tested solutions. I haven't had the time to get to them yet. If you're considering this for GSoC, happy to discuss approaches.