GSoC 2026 Proposal Draft – Idea 4: Chat with your Note Collection using AI – Dipanshu Rawat
Links
- Project idea: https://joplinapp.org/gsoc2026/ideas/
- GitHub: dipanshurdev (Dipanshu Rawat) · GitHub
- Forum introduction: Welcome to GSoC 2026 with Joplin! - #58 by dipanshurdev
- Pull requests to Joplin:
- Other relevant experience:
- StudyOS AI – AI study platform I built using HuggingFace for automated note analysis and MCQ generation
- Portfolio
1. Introduction
I'm Dipanshu Rawat, a software developer from India with about two years of professional experience. I worked at Vojic LLC building VMeet, an AI meeting assistant that handles real-time transcription and summarization. Before that I was at Claritel.ai integrating ML pipelines into a FastAPI/TypeScript backend. I've worked with HuggingFace models and embedding-based systems in production, so the retrieval side of this project is familiar ground.
I've been contributing to Joplin's core since early 2026, UI state management, TypeScript utility optimizations, localization. Those contributions gave me a working understanding of the plugin API and its constraints, which shaped the technical decisions in this proposal.
2. Project Summary
The problem. Joplin's search is excellent at finding notes containing specific keywords. It can't answer questions. Someone who has spent years clipping articles, writing research notes, and building a knowledge base has no way to interrogate that collection; they search, open several notes, and piece the answer together manually.
What already exists. Jarvis already implements a RAG pipeline inside Joplin, heading-aware chunking, metadata decoration, cosine similarity retrieval, context expansion, and a Search: command that combines semantic and keyword lookup. It works surprisingly well. @shikuz (Jarvis developer and primary mentor) has described it as "effective despite the simple implementation."
What this project adds. This proposal addresses the four gaps the mentor specifically listed as missing from Jarvis:
- A dedicated cross-encoder re-ranker for better precision
- Query decomposition for complex multi-topic questions
- Automatic hybrid scoring (BM25 + vector) instead of a manual keyword toggle
- Relevant Segment Extraction (RSE) for more coherent context assembly
The result is a conversational chat panel inside Joplin with clickable citations back to source notes, an improved retrieval pipeline, and a zero-setup experience; the embedding model ships with the plugin, no Ollama required to get started.
Out of scope for v1.0:
- Fine-tuning or training LLMs
- Indexing images or PDF attachments
- Mobile/web support
3. Technical Approach
What I learned from Jarvis before writing this
I read through Jarvis's source and @shikuz's forum post carefully before deciding what to build. The chunking strategy (heading-first, code-block-aware) already works well. I'm using the same approach with a sliding window fallback for notes without headings. The retrieval foundation is solid. This project's value is in the pipeline stages that come after basic cosine similarity retrieval.
How Joplin's plugin API exposes notes
The note fetching API uses a paginated data.get() call. Reading through the
plugin API source, each page returns up to the requested limit with a has_more
boolean that drives the batch loop:
The parent_id field is the notebook ID, this is what notebook scoping
filters against at index time. Tags are fetched separately via a linked
resource call:
Both of these get stored as metadata on each chunk so vectra can filter
WHERE parent_id = 'notebookId' without re-fetching from the API at
query time.
How panels communicate with the plugin backend
Joplin panels are webviews. They can't call plugin API methods directly.
Communication goes through a message-passing bridge:
The main technical constraint: no native packages in plugins
Joplin's API documentation states explicitly that native packages cannot be bundled with plugins because they need to work cross-platform. This rules out options like sqlite-vec (C extension) or anything with compiled .node binaries.
For vector storage, the right tool here is vectra, a pure TypeScript/JavaScript local vector database that stores its index as JSON files on disk. No native dependencies, no compilation, works inside a Joplin plugin. It loads the index into memory for fast cosine similarity search (typically 1–2ms even for large indexes) and has built-in metadata filtering and hybrid BM25+vector retrieval. For a typical Joplin vault this is more than sufficient.
For embeddings, Transformers.js with ONNX/WASM backend works inside Joplin; Jarvis already ships Universal Sentence Encoder this way, confirming it's a viable path.
Technology decisions
Vector store
| Option | Why | Decision |
|---|---|---|
| vectra | Pure TypeScript, zero native dependencies. File-backed JSON. Built-in cosine similarity + BM25 hybrid. Works in Joplin's plugin environment. | |
| In-memory array | Zero dependencies, fast, but lost on restart and RAM-intensive for large vaults. | |
| sqlite-vec | Fast and well-designed, but requires native C bindings — not compatible with Joplin's no-native-packages constraint. | |
| LanceDB, Chroma, Qdrant | Require native binaries or a running server. |
Embedding model
Scores from the MTEB leaderboard, retrieval task:
| Model | MTEB Retrieval | Size | How it runs | Decision |
|---|---|---|---|---|
| BGE-small-en-v1.5 | 51.7 | 24 MB | Transformers.js WASM | |
| all-MiniLM-L6-v2 | 49.1 | 23 MB | Transformers.js WASM | |
| nomic-embed-text | ~62.3 | 274 MB | Ollama | |
| text-embedding-3-small | ~62.3 | API | OpenAI |
BGE-small ships bundled with the plugin. Indexing works immediately after installation — no Ollama setup, no API key. The difference over MiniLM is 51.7 vs 49.1 MTEB retrieval score at nearly identical size, which is worth having.
Chunking
Same heading-first approach Jarvis uses (it works), with additions:
- Notes without headings fall through to a 400-token sliding window, 50-token overlap
- Sections under ~80 tokens get merged with the next section
- Each chunk stores: note ID, note title, heading path (e.g.
Research / ML / Transformers), tags, chunk index
The heading path is what makes citations useful; users see not just which note answered their question, but which section of it.
The four retrieval improvements
1. Cross-encoder re-ranking
vectra's cosine similarity finds chunks that are topically close to the query. A cross-encoder evaluates query and chunk together, which is more accurate at spotting whether a chunk actually answers the question. The pipeline is cosine similarity to retrieve top 20 candidates, then cross-encoder/ms-marco-MiniLM-L-6-v2 via Transformers.js to re-score and select top 5. Around 150ms added latency on the CPU is acceptable when the query is going to an LLM anyway.
The mentor noted that large context windows let a strong LLM implicitly re-rank by filtering false positives. That works but wastes tokens on every query. Explicit re-ranking keeps the prompt tight.
2. Hybrid BM25 + vector scoring
vectra has built-in support for hybrid BM25+vector retrieval (the isBm25 flag). This means exact term matches get a boost without the user needing to manually switch to keyword search mode. When a user asks about a specific person's name or a technical library, BM25 ensures those exact matches surface even if the semantic embedding similarity is imprecise. For conceptual questions, vector similarity leads. The two are combined automatically via Reciprocal Rank Fusion.
3. Query decomposition
A question like "compare what I wrote about React vs Vue in terms of performance" spans two topics. Single-shot retrieval will favour whichever topic has more notes. The fix is to detect multi-topic questions and generate sub-queries — React performance, Vue performance and retrieve independently, then merge before re-ranking. Single-topic questions skip this step entirely.
4. Relevant Segment Extraction (RSE)
After re-ranking, some high-scoring chunks may be adjacent in the original note and read better as one continuous passage. RSE detects when consecutive chunks from the same note are contiguous and merges them into a single segment before prompt injection. This avoids the mid-sentence retrieval boundaries that can make answers feel disconnected from their source. Cross-section merging is blocked using a section_id field stored per chunk.
Indexing and keeping it current
Initial sync: All notes are fetched via joplin.data.get() in batches of 50, processed in a background worker_threads instance. The main Electron thread stays free.
Change detection: Each note's body is hashed (SHA-256). The hash is stored alongside its vectors. On any re-index run, notes whose hash hasn't changed are skipped. This makes routine re-index passes fast.
Event-driven updates: joplin.workspace.onNoteChange fires when the currently open note changes. This handles the common case: the user edits a note they're looking at, and the index updates within seconds.
Periodic background polling: onNoteChange only fires for the currently selected note, not for bulk changes (imports, sync). To catch those, a lightweight background timer compares note updated_time values against the last-indexed timestamps and queues anything that's changed. This runs infrequently (every few minutes) and is near-instant for vaults that haven't changed.
Architecture Overview
UI
The plugin uses joplin.views.panels.create() for a persistent sidebar, not a dialog, which can't hold a conversation. The panel loads a React app via the plugin's HTML injection system.
A [ + Context Filter ] dropdown lets users scope queries to specific notebooks or tags. When active, vectra's metadata filtering handles this, no separate query path needed.
Citations work by prompting the LLM to include [note_id] markers in its response. The React frontend parses these and renders them as clickable badges. Clicking one runs joplin.commands.execute('openNote', noteId) which navigates the main Joplin window to that note. The answer and its source are one click apart.
Prompt construction and multi-turn design
The prompt has three layers assembled in this order: system instruction (~150 tokens, fixed), retrieved chunks (top 5 after reranking, each capped at 400 tokens, so ~2,000 tokens maximum), and conversation history.
Token budgeting works as follows: after chunks and system prompt are placed, the remaining budget is calculated. Conversation history is then added oldest-turn-first until the budget is exhausted. For a typical 8k context model this comfortably fits several turns. For smaller local models the history window shrinks automatically, the logic drops the oldest turns first, never the chunks.
For multi-turn retrieval specifically, conversation history feeds the query rewriting step before embedding. The new question is rewritten into a standalone query using the last 2–3 turns as context. So a follow-up like "what about the second approach?" becomes "what is the second approach to X that was discussed", which actually retrieves meaningfully instead of matching nothing.
IPC message schema
| Direction | type | Payload |
|---|---|---|
| Panel → Plugin | query |
{ text, notebookFilter[], tagFilter[] } |
| Plugin → Panel | token |
{ text } — streaming chunk |
| Plugin → Panel | answer |
{ text, citations[] } |
| Plugin → Panel | indexProgress |
{ current, total } |
| Plugin → Panel | error |
{ message, code } |
| Panel → Plugin | openNote |
{ noteId } |
Privacy
Default (local) mode: BGE-small runs via WASM, Ollama handles generation; no note content leaves the machine. The vectra index lives in joplin.plugins.dataDir() and is not part of Joplin's sync.
Cloud mode: only the question and retrieved chunks (a few paragraphs) go via HTTPS. The full vault is never transmitted. Users see a clear warning before this is enabled. API keys go through Joplin's settings API, never plaintext.
Potential challenges
Transformers.js WASM in Joplin's Webpack environment. Jarvis ships a model this way, confirming it's possible. But WASM module loading can have configuration quirks in specific Webpack setups. I'll validate this in the first week of Community Bonding with a minimal proof-of-concept before writing the main indexing pipeline.
vectra memory usage for large vaults. vectra loads the entire index into memory. A vault with 10,000 notes at ~5 chunks each and 384-dimensional float32 vectors is roughly 75MB of index data in RAM, manageable, but worth measuring. If a user has an unusually large vault, I'll add a configurable chunk limit or a note to document this.
Cross-encoder latency on slow machines. 150ms re-ranking over 20 chunks is fine on a modern machine. On older hardware it could be noticeable. I'll make re-ranking optional and off by default for users who want faster responses.
Vectra on mobile and Transformers.js compatibility. Vectra's file-based storage relies on Node.js fs; mobile is already out of scope for v1.0, but it's worth understanding the constraint now rather than discovering it blocks a future port. Before writing the main indexing pipeline, I'll study @HahaBill's AI summarization plugin to understand exactly how Transformers.js runs inside Joplin's plugin sandbox, since that's the closest existing reference for this kind of ML integration in the codebase.
Testing
- Unit tests: chunking edge cases (no headings, code blocks at section boundaries, empty notes), hash change detection, vectra CRUD operations
- Integration test: index a 300-note test vault, modify one note, verify only that note's chunks are updated
- Retrieval quality: a set of 15 question/expected-source-note pairs run against both pipelines, cosine-only and cosine+reranker, comparing MRR (Mean Reciprocal Rank) and Hit@3. If the correct source note appears at rank 1 more often with reranking than without, that confirms the improvement is real. Results from both pipelines are logged so the comparison is explicit rather than subjective.
- Manual testing on a real Joplin vault of reasonable size before submission
Documentation
- User guide: installation, first-run indexing, Ollama setup, API key configuration, privacy explanation, notebook scoping
- Developer notes: the internal module boundaries, IPC schema, and how to swap the embedding or storage layer are relevant if a shared infrastructure emerges from the other AI GSoC projects
4. Implementation Plan
Community Bonding (May 4 – May 26)
The two things I want to confirm before coding starts:
- Transformers.js WASM loading works correctly in Joplin's Webpack/Electron build (minimal test plugin)
- vectra indexing and querying works inside a plugin's sandboxed Node.js environment
Beyond validation: read Jarvis source in depth, finalise the module interface design with @shikuz, set up the repository and CI/CD.
Phase 1: Indexer (May 27 – June 24)
Week 1 — Chunking: heading-aware splitter, sliding window fallback, section merging, unit tests for edge cases
Week 2 — Embedding pipeline: BGE-small via Transformers.js, running in worker_threads
Week 3 — vectra integration: insert, update (delete + reinsert by note ID), delete, metadata filters for notebook/tag scoping
Week 4 — Incremental indexing: SHA-256 hashing, onNoteChange hook, periodic polling for bulk changes, IPC progress messages
Midterm checkpoint: given a Joplin vault, the indexer chunks everything, stores vectors in vectra, and correctly handles note edits and deletions on re-run.
Phase 2: Query Pipeline (June 25 – July 22)
Week 5 — Hybrid retrieval: vectra's built-in BM25+vector scoring with RRF fusion
Week 6 — Cross-encoder re-ranking: top 20 → top 5 via Transformers.js
Week 7 — Query decomposition: multi-topic detection, sub-query generation, result merging
Week 8 — RSE: contiguous chunk merging with cross-section boundary protection; prompt builder with token budget management; Ollama + OpenAI/Anthropic integration
Phase 3: UI (July 23 – August 12)
Week 9 — React chat panel: message layout, typing indicator, status bar, first-run progress
Week 10 — IPC bridge and streaming: token-by-token rendering
Week 11 — Context Filter UI (notebook/tag scoping); settings panel (API keys, model selection, excluded notebooks)
Week 12 — Citation engine: [note_id] parsing, clickable badges, Sources accordion, openNote integration
Phase 4: Testing and Docs (August 13 – September 1)
Week 13 — Large vault testing, retrieval quality checks, bug fixing
Week 14 — User and developer documentation
Final week — Code review, PR cleanup, plugin registry submission
5. Deliverables
- A Joplin plugin installable from the plugin registry
- Zero-setup indexing: BGE-small bundled and works without Ollama or an API key
- Background indexer with hash-based change detection and event + polling sync
- Hybrid BM25+vector retrieval with automatic fusion
- Cross-encoder re-ranking
- Query decomposition for multi-topic questions
- Relevant Segment Extraction
- React sidebar chat with streaming, scoping, and clickable citations
- Local LLM support via Ollama; optional cloud API support
- Unit and integration test suite
- User and developer documentation
6. Availability
Hours per week: 40–45
Time zone: IST (UTC+5:30)
Other commitments: No conflicting internships or jobs. College coursework will not affect committed GSoC hours.
I'll post weekly progress updates on the Joplin forum and stay available on Discord throughout the program.







