GSoC 2026 Proposal Draft – Idea 4: Chat with your Note Collection using AI – Dipanshu Rawat

GSoC 2026 Proposal Draft – Idea 4: Chat with your Note Collection using AI – Dipanshu Rawat

Links


1. Introduction

I'm Dipanshu Rawat, a software developer from India with about two years of professional experience. I worked at Vojic LLC building VMeet, an AI meeting assistant that handles real-time transcription and summarization. Before that I was at Claritel.ai integrating ML pipelines into a FastAPI/TypeScript backend. I've worked with HuggingFace models and embedding-based systems in production, so the retrieval side of this project is familiar ground.

I've been contributing to Joplin's core since early 2026, UI state management, TypeScript utility optimizations, localization. Those contributions gave me a working understanding of the plugin API and its constraints, which shaped the technical decisions in this proposal.


2. Project Summary

The problem. Joplin's search is excellent at finding notes containing specific keywords. It can't answer questions. Someone who has spent years clipping articles, writing research notes, and building a knowledge base has no way to interrogate that collection; they search, open several notes, and piece the answer together manually.

What already exists. Jarvis already implements a RAG pipeline inside Joplin, heading-aware chunking, metadata decoration, cosine similarity retrieval, context expansion, and a Search: command that combines semantic and keyword lookup. It works surprisingly well. @shikuz (Jarvis developer and primary mentor) has described it as "effective despite the simple implementation."

What this project adds. This proposal addresses the four gaps the mentor specifically listed as missing from Jarvis:

  1. A dedicated cross-encoder re-ranker for better precision
  2. Query decomposition for complex multi-topic questions
  3. Automatic hybrid scoring (BM25 + vector) instead of a manual keyword toggle
  4. Relevant Segment Extraction (RSE) for more coherent context assembly

The result is a conversational chat panel inside Joplin with clickable citations back to source notes, an improved retrieval pipeline, and a zero-setup experience; the embedding model ships with the plugin, no Ollama required to get started.

Out of scope for v1.0:

  • Fine-tuning or training LLMs
  • Indexing images or PDF attachments
  • Mobile/web support

3. Technical Approach

What I learned from Jarvis before writing this

I read through Jarvis's source and @shikuz's forum post carefully before deciding what to build. The chunking strategy (heading-first, code-block-aware) already works well. I'm using the same approach with a sliding window fallback for notes without headings. The retrieval foundation is solid. This project's value is in the pipeline stages that come after basic cosine similarity retrieval.

How Joplin's plugin API exposes notes

The note fetching API uses a paginated data.get() call. Reading through the
plugin API source, each page returns up to the requested limit with a has_more
boolean that drives the batch loop:

The parent_id field is the notebook ID, this is what notebook scoping
filters against at index time. Tags are fetched separately via a linked
resource call:

Both of these get stored as metadata on each chunk so vectra can filter
WHERE parent_id = 'notebookId' without re-fetching from the API at
query time.

How panels communicate with the plugin backend

Joplin panels are webviews. They can't call plugin API methods directly.
Communication goes through a message-passing bridge:

The main technical constraint: no native packages in plugins

Joplin's API documentation states explicitly that native packages cannot be bundled with plugins because they need to work cross-platform. This rules out options like sqlite-vec (C extension) or anything with compiled .node binaries.

For vector storage, the right tool here is vectra, a pure TypeScript/JavaScript local vector database that stores its index as JSON files on disk. No native dependencies, no compilation, works inside a Joplin plugin. It loads the index into memory for fast cosine similarity search (typically 1–2ms even for large indexes) and has built-in metadata filtering and hybrid BM25+vector retrieval. For a typical Joplin vault this is more than sufficient.

For embeddings, Transformers.js with ONNX/WASM backend works inside Joplin; Jarvis already ships Universal Sentence Encoder this way, confirming it's a viable path.

Technology decisions

Vector store

Option Why Decision
vectra Pure TypeScript, zero native dependencies. File-backed JSON. Built-in cosine similarity + BM25 hybrid. Works in Joplin's plugin environment. :white_check_mark: Primary
In-memory array Zero dependencies, fast, but lost on restart and RAM-intensive for large vaults. :yellow_circle: Fallback / early dev
sqlite-vec Fast and well-designed, but requires native C bindings — not compatible with Joplin's no-native-packages constraint. :cross_mark: Not viable
LanceDB, Chroma, Qdrant Require native binaries or a running server. :cross_mark: Not viable

Embedding model

Scores from the MTEB leaderboard, retrieval task:

Model MTEB Retrieval Size How it runs Decision
BGE-small-en-v1.5 51.7 24 MB Transformers.js WASM :white_check_mark: Bundled default
all-MiniLM-L6-v2 49.1 23 MB Transformers.js WASM :yellow_circle: Fallback
nomic-embed-text ~62.3 274 MB Ollama :white_check_mark: Optional for power users
text-embedding-3-small ~62.3 API OpenAI :white_check_mark: Optional, user API key

BGE-small ships bundled with the plugin. Indexing works immediately after installation — no Ollama setup, no API key. The difference over MiniLM is 51.7 vs 49.1 MTEB retrieval score at nearly identical size, which is worth having.

Chunking

Same heading-first approach Jarvis uses (it works), with additions:

  • Notes without headings fall through to a 400-token sliding window, 50-token overlap
  • Sections under ~80 tokens get merged with the next section
  • Each chunk stores: note ID, note title, heading path (e.g. Research / ML / Transformers), tags, chunk index

The heading path is what makes citations useful; users see not just which note answered their question, but which section of it.

The four retrieval improvements

1. Cross-encoder re-ranking

vectra's cosine similarity finds chunks that are topically close to the query. A cross-encoder evaluates query and chunk together, which is more accurate at spotting whether a chunk actually answers the question. The pipeline is cosine similarity to retrieve top 20 candidates, then cross-encoder/ms-marco-MiniLM-L-6-v2 via Transformers.js to re-score and select top 5. Around 150ms added latency on the CPU is acceptable when the query is going to an LLM anyway.

The mentor noted that large context windows let a strong LLM implicitly re-rank by filtering false positives. That works but wastes tokens on every query. Explicit re-ranking keeps the prompt tight.

2. Hybrid BM25 + vector scoring

vectra has built-in support for hybrid BM25+vector retrieval (the isBm25 flag). This means exact term matches get a boost without the user needing to manually switch to keyword search mode. When a user asks about a specific person's name or a technical library, BM25 ensures those exact matches surface even if the semantic embedding similarity is imprecise. For conceptual questions, vector similarity leads. The two are combined automatically via Reciprocal Rank Fusion.

3. Query decomposition

A question like "compare what I wrote about React vs Vue in terms of performance" spans two topics. Single-shot retrieval will favour whichever topic has more notes. The fix is to detect multi-topic questions and generate sub-queries — React performance, Vue performance and retrieve independently, then merge before re-ranking. Single-topic questions skip this step entirely.

4. Relevant Segment Extraction (RSE)

After re-ranking, some high-scoring chunks may be adjacent in the original note and read better as one continuous passage. RSE detects when consecutive chunks from the same note are contiguous and merges them into a single segment before prompt injection. This avoids the mid-sentence retrieval boundaries that can make answers feel disconnected from their source. Cross-section merging is blocked using a section_id field stored per chunk.

Indexing and keeping it current

Initial sync: All notes are fetched via joplin.data.get() in batches of 50, processed in a background worker_threads instance. The main Electron thread stays free.

Change detection: Each note's body is hashed (SHA-256). The hash is stored alongside its vectors. On any re-index run, notes whose hash hasn't changed are skipped. This makes routine re-index passes fast.

Event-driven updates: joplin.workspace.onNoteChange fires when the currently open note changes. This handles the common case: the user edits a note they're looking at, and the index updates within seconds.

Periodic background polling: onNoteChange only fires for the currently selected note, not for bulk changes (imports, sync). To catch those, a lightweight background timer compares note updated_time values against the last-indexed timestamps and queues anything that's changed. This runs infrequently (every few minutes) and is near-instant for vaults that haven't changed.

Architecture Overview

UI

The plugin uses joplin.views.panels.create() for a persistent sidebar, not a dialog, which can't hold a conversation. The panel loads a React app via the plugin's HTML injection system.

A [ + Context Filter ] dropdown lets users scope queries to specific notebooks or tags. When active, vectra's metadata filtering handles this, no separate query path needed.

Citations work by prompting the LLM to include [note_id] markers in its response. The React frontend parses these and renders them as clickable badges. Clicking one runs joplin.commands.execute('openNote', noteId) which navigates the main Joplin window to that note. The answer and its source are one click apart.

Prompt construction and multi-turn design

The prompt has three layers assembled in this order: system instruction (~150 tokens, fixed), retrieved chunks (top 5 after reranking, each capped at 400 tokens, so ~2,000 tokens maximum), and conversation history.
Token budgeting works as follows: after chunks and system prompt are placed, the remaining budget is calculated. Conversation history is then added oldest-turn-first until the budget is exhausted. For a typical 8k context model this comfortably fits several turns. For smaller local models the history window shrinks automatically, the logic drops the oldest turns first, never the chunks.
For multi-turn retrieval specifically, conversation history feeds the query rewriting step before embedding. The new question is rewritten into a standalone query using the last 2–3 turns as context. So a follow-up like "what about the second approach?" becomes "what is the second approach to X that was discussed", which actually retrieves meaningfully instead of matching nothing.

IPC message schema

Direction type Payload
Panel → Plugin query { text, notebookFilter[], tagFilter[] }
Plugin → Panel token { text } — streaming chunk
Plugin → Panel answer { text, citations[] }
Plugin → Panel indexProgress { current, total }
Plugin → Panel error { message, code }
Panel → Plugin openNote { noteId }

Privacy

Default (local) mode: BGE-small runs via WASM, Ollama handles generation; no note content leaves the machine. The vectra index lives in joplin.plugins.dataDir() and is not part of Joplin's sync.

Cloud mode: only the question and retrieved chunks (a few paragraphs) go via HTTPS. The full vault is never transmitted. Users see a clear warning before this is enabled. API keys go through Joplin's settings API, never plaintext.

Potential challenges

Transformers.js WASM in Joplin's Webpack environment. Jarvis ships a model this way, confirming it's possible. But WASM module loading can have configuration quirks in specific Webpack setups. I'll validate this in the first week of Community Bonding with a minimal proof-of-concept before writing the main indexing pipeline.

vectra memory usage for large vaults. vectra loads the entire index into memory. A vault with 10,000 notes at ~5 chunks each and 384-dimensional float32 vectors is roughly 75MB of index data in RAM, manageable, but worth measuring. If a user has an unusually large vault, I'll add a configurable chunk limit or a note to document this.

Cross-encoder latency on slow machines. 150ms re-ranking over 20 chunks is fine on a modern machine. On older hardware it could be noticeable. I'll make re-ranking optional and off by default for users who want faster responses.

Vectra on mobile and Transformers.js compatibility. Vectra's file-based storage relies on Node.js fs; mobile is already out of scope for v1.0, but it's worth understanding the constraint now rather than discovering it blocks a future port. Before writing the main indexing pipeline, I'll study @HahaBill's AI summarization plugin to understand exactly how Transformers.js runs inside Joplin's plugin sandbox, since that's the closest existing reference for this kind of ML integration in the codebase.

Testing

  • Unit tests: chunking edge cases (no headings, code blocks at section boundaries, empty notes), hash change detection, vectra CRUD operations
  • Integration test: index a 300-note test vault, modify one note, verify only that note's chunks are updated
  • Retrieval quality: a set of 15 question/expected-source-note pairs run against both pipelines, cosine-only and cosine+reranker, comparing MRR (Mean Reciprocal Rank) and Hit@3. If the correct source note appears at rank 1 more often with reranking than without, that confirms the improvement is real. Results from both pipelines are logged so the comparison is explicit rather than subjective.
  • Manual testing on a real Joplin vault of reasonable size before submission

Documentation

  • User guide: installation, first-run indexing, Ollama setup, API key configuration, privacy explanation, notebook scoping
  • Developer notes: the internal module boundaries, IPC schema, and how to swap the embedding or storage layer are relevant if a shared infrastructure emerges from the other AI GSoC projects

4. Implementation Plan

Community Bonding (May 4 – May 26)

The two things I want to confirm before coding starts:

  • Transformers.js WASM loading works correctly in Joplin's Webpack/Electron build (minimal test plugin)
  • vectra indexing and querying works inside a plugin's sandboxed Node.js environment

Beyond validation: read Jarvis source in depth, finalise the module interface design with @shikuz, set up the repository and CI/CD.

Phase 1: Indexer (May 27 – June 24)

Week 1 — Chunking: heading-aware splitter, sliding window fallback, section merging, unit tests for edge cases

Week 2 — Embedding pipeline: BGE-small via Transformers.js, running in worker_threads

Week 3 — vectra integration: insert, update (delete + reinsert by note ID), delete, metadata filters for notebook/tag scoping

Week 4 — Incremental indexing: SHA-256 hashing, onNoteChange hook, periodic polling for bulk changes, IPC progress messages

Midterm checkpoint: given a Joplin vault, the indexer chunks everything, stores vectors in vectra, and correctly handles note edits and deletions on re-run.

Phase 2: Query Pipeline (June 25 – July 22)

Week 5 — Hybrid retrieval: vectra's built-in BM25+vector scoring with RRF fusion

Week 6 — Cross-encoder re-ranking: top 20 → top 5 via Transformers.js

Week 7 — Query decomposition: multi-topic detection, sub-query generation, result merging

Week 8 — RSE: contiguous chunk merging with cross-section boundary protection; prompt builder with token budget management; Ollama + OpenAI/Anthropic integration

Phase 3: UI (July 23 – August 12)

Week 9 — React chat panel: message layout, typing indicator, status bar, first-run progress

Week 10 — IPC bridge and streaming: token-by-token rendering

Week 11 — Context Filter UI (notebook/tag scoping); settings panel (API keys, model selection, excluded notebooks)

Week 12 — Citation engine: [note_id] parsing, clickable badges, Sources accordion, openNote integration

Phase 4: Testing and Docs (August 13 – September 1)

Week 13 — Large vault testing, retrieval quality checks, bug fixing

Week 14 — User and developer documentation

Final week — Code review, PR cleanup, plugin registry submission


5. Deliverables

  • A Joplin plugin installable from the plugin registry
  • Zero-setup indexing: BGE-small bundled and works without Ollama or an API key
  • Background indexer with hash-based change detection and event + polling sync
  • Hybrid BM25+vector retrieval with automatic fusion
  • Cross-encoder re-ranking
  • Query decomposition for multi-topic questions
  • Relevant Segment Extraction
  • React sidebar chat with streaming, scoping, and clickable citations
  • Local LLM support via Ollama; optional cloud API support
  • Unit and integration test suite
  • User and developer documentation

6. Availability

Hours per week: 40–45
Time zone: IST (UTC+5:30)
Other commitments: No conflicting internships or jobs. College coursework will not affect committed GSoC hours.

I'll post weekly progress updates on the Joplin forum and stay available on Discord throughout the program.

Would appreciate your feedback when you have the time :slight_smile:
@shikuz, @HahaBill , @malekhavasi

Hey @dipanshurdev, sorry for the late reply. The retrieval pipeline is well-scoped.

A few questions. The token budget is mentioned in Week 8 but not designed - how do you fit chunks + conversation history + system prompt into the context window? Same for multi-turn: how does the second question in a session use context from the first?

You mention 15 manual question/source pairs for evaluation. How would you know whether reranking actually improved results vs. cosine-only?

On mobile: would Vectra's file-based storage work there? Worth looking at the AI summarisation plugin (by @HahaBill) for how Transformers.js runs inside a Joplin plugin.

Hey @shikuz, thanks for the feedback, really helpful questions.

On token budget and multi-turn: The prompt has three layers: system instruction, retrieved chunks, and conversation history. The token counter runs after chunks are selected, calculates the remaining budget, then adds history turns oldest-first until it's exhausted. For multi-turn conversation history also feeds the query rewriting step, the new question gets rewritten into a standalone query before embedding, so follow-up questions like "what about the second approach?" actually retrieve meaningfully instead of matching nothing. I'll add this design explicitly to the proposal rather than leaving it to Week 8.

On evaluation: You're right that 15 pairs alone don't tell you whether reranking helped. The proper comparison is running cosine-only and cosine+reranker on the same set and comparing MRR and Hit@3. If the correct note appears at rank 1 more often with reranking, that's the signal. I'll log ranked results from both pipelines so the comparison is explicit. I'll update the proposal to reflect this.

On mobile and Vectra: Honestly, I'm not certain Vectra's file I/O behaves the same way on mobile. I'll look at @HahaBill's summarisation plugin to understand how Transformers.js actually runs inside Joplin before assuming it translates. Mobile is out of scope for v1.0 but worth understanding now so I don't accidentally close off that path.

Hi @shikuz, I’ve updated the proposal to address your feedback by:

  • Adding an explicit token budget and multi-turn design to Section 3
  • Updating the evaluation plan to compare MRR/Hit@3 across cosine-only vs. reranker pipelines
  • Adding a note on Vectra/mobile constraints, with a plan to study @HahaBill’s plugin before starting

Thanks again for the pointer!