Links
| Project Idea : gsoc/ideas.md at master · joplin/gsoc · GitHub | |||
|---|---|---|---|
| GitHub : anuradha1304 (Anuradha Verma) · GitHub | |||
| Forum intro: Welcome to GSoC 2026 with Joplin! - #24 by anuradha1304 | |||
| Pull requests submitted: https://github.com/laurent22/joplin/pull/14650 - Merged | |||
| Other relevant experience: Built a deepfake detection project (image classification, AI/ML pipeline exposure). Built an independent Hospital Management System in React (patient records, appointments, billing - real-world state management at scale). |
AI Assistance Disclosure:
Used AI tools to research RAG architectures, review LanceDB and Transformers.js documentation, and explore Joplin's plugin API surface. Technical decisions, implementation plan, and all writing are my own. I have reviewed and fully understand every decision made here. I did not use AI to write this proposal - only to accelerate research.
1. Introduction
I'm Anuradha Verma, a third-year Computer Science and Engineering student at Chandigarh University, Punjab, India. My focus is frontend development - React, JavaScript, TypeScript, and building interfaces that feel good to use.
I started contributing to Joplin in early 2026. My first PR (#14650) fixed a keyboard focus bug in the note list - when a note was deleted, focus was lost entirely, breaking keyboard navigation. Working on that fix required reading NoteList2.tsx, understanding the Redux state shape (focusedField, selectedFolderId, selectedNoteIds), the useFocusNote hook, and how Joplin manages ARIA accessibility. That PR is now merged. I understand how the desktop app is structured.
I'm applying for this project because I've personally felt the problem it solves. I keep notes constantly - class material, research, ideas. There have been multiple times I knew I had saved something important but couldn't find it. I remembered writing it, but couldn't recall the exact words I used, so keyword search failed me. The information was there. I just couldn't get to it. That frustration is exactly what this project fixes - and fixing it in a way that works entirely locally, without sending notes to any server, is the right way to do it.
2. Project Summary
The problem: Joplin users build large note collections over years. Keyword search only works if you remember the exact words you used. If you know something is in your notes but can't remember the phrasing, you're stuck.
Why it matters: Power users - researchers, writers, students, developers - maintain thousands of notes. The knowledge is there. Retrieving it is the bottleneck.
What will be implemented: A Joplin desktop plugin that lets users ask natural language questions about their notes and receive answers citing the specific notes they came from. Responses stream in real time. Citations are clickable and open the source note directly in Joplin. The entire pipeline runs locally by default - no data leaves the machine.
Expected outcome: A published Joplin plugin with a streaming chat panel, local indexing pipeline, multi-provider LLM support (Ollama, OpenAI, Anthropic), hybrid retrieval (vector + keyword search), and clickable note citations.
Out of scope: Mobile support, multi-language embeddings beyond English, real-time collaborative chat, training custom models.
3. Technical Approach
Architecture:
[Plugin Panel — React UI]
↕ IPC (postMessage)
[Plugin Backend — Node.js]
├── Indexing Service
│ ├── Note fetcher (joplin.data API)
│ ├── Markdown chunker (heading-aware)
│ ├── Embedding generator (Transformers.js)
│ └── Vector store (LanceDB)
└── Query Service
├── Hybrid retriever (vector + Joplin FTS)
├── Context builder + prompt assembler
└── LLM client (Ollama / OpenAI / Anthropic)
Why a plugin: The plugin API (joplin.views.panels, joplin.data, joplin.workspace) provides everything needed. A plugin can be installed optionally, updated independently, and distributed via the plugin store without requiring core changes.
Note ingestion and chunking: Notes are fetched via joplin.data.getAll('notes', ['id', 'title', 'body', 'updated_time']). Each note's Markdown body is split into chunks at heading boundaries and paragraph breaks - not arbitrary character counts. Each chunk is tagged with noteId, title, headingContext, and chunkIndex. This makes citations precise - the AI can reference a specific section, not just the whole note.
Incremental indexing: On startup and on joplin.workspace.onNoteChange(), the indexer compares each note's updated_time against a stored timestamp in LanceDB metadata. Only changed or new notes are re-embedded. A collection of 10,000 notes is indexed once; subsequent startups are near-instant.
Embeddings: Default: all-MiniLM-L6-v2 via Transformers.js running in Node.js (WASM). 384-dimensional vectors, fast on CPU, no GPU or API key needed. Optional: OpenAI text-embedding-3-small for higher quality.
Vector store: LanceDB - embedded, no server process, works natively in Node.js/Electron, stores vectors alongside metadata. Index lives in joplin.plugins.dataDir.
Hybrid retrieval: Most RAG implementations only do vector search. This misses exact keyword matches - proper nouns, technical terms, specific facts. This plugin combines:
-
Vector similarity search (top-K from LanceDB)
-
Keyword search via
joplin.data.get('search', {query: userQuery})— Joplin's existing full-text search
Results are merged using Reciprocal Rank Fusion. This gives significantly better recall than either method alone.
LLM integration: Three providers, all configurable via Joplin's settings API:
-
Ollama (default, local):
llama3.2:3b- no API key, fully offline -
OpenAI:
gpt-4o-mini -
Anthropic:
claude-haiku-3-5
All providers use streaming - responses appear token by token, not after a full wait.
Chat UI: The React panel handles:
-
Streaming text display (tokens appear as generated)
-
Source citations as chips:
[Note title · Section heading] -
Clicking a citation calls
joplin.commands.execute('openNote', noteId)- opens the note in Joplin -
Conversation history per session
-
Follow-up questions using previous context window
-
Indexing progress indicator so users know when the index is ready
-
Settings accessible from the panel header (provider, model, chunk count)
Changes to Joplin core: Minimal. This is a plugin. If a small addition to the plugin API is needed (e.g. if onNoteChange doesn't cover all relevant events), I'll audit this during community bonding and file a separate small PR if required.
Risks and mitigations: LLM inference on low-end hardware is addressed by defaulting to llama3.2:3b which runs on CPU with 8GB RAM. LanceDB and Transformers.js compatibility with Electron will be prototyped and confirmed during community bonding before any production code is written. Embeddings are generated in a worker thread in batches to avoid blocking the UI.
Testing strategy:
-
Unit tests for the Markdown chunker (various note formats, empty notes, code-block-only notes)
-
Unit tests for the hybrid retrieval merger (RRF algorithm correctness)
-
Integration tests for full indexing pipeline on a small note collection
-
Manual performance testing on collections of 100, 1,000, and 5,000 notes
Documentation:
-
User guide: installation, first-time setup, choosing a provider, how to ask effective questions
-
Developer guide: architecture overview, how to add a new LLM provider, IPC message format
4. Implementation Plan
| Period | Tasks | |
|---|---|---|
| Community Bonding May 8 - June 1 |
Study full plugin API docs, prototype LanceDB + Transformers.js in standalone Node.js script, confirm onNoteChange behavior, discuss architecture with mentor |
|
| Week 1 - 2 June 2 - 15 |
Plugin scaffold with working panel (empty UI), note fetcher via joplin.data, heading-aware Markdown chunker, unit tests for chunker |
|
| Week 3 - 4 June 16 - 29 |
LanceDB integration, Transformers.js embedding in worker thread, full index build on first run, incremental indexing via updated_time, progress reporting to UI via IPC |
|
| Week 5- 6 June 30 - July 13 |
Vector similarity query, Joplin FTS keyword search, Reciprocal Rank Fusion merger, context window builder, unit tests for retrieval + RRF | |
| Week 7 - 8 July 14 - 27 |
Ollama, OpenAI, Anthropic streaming clients, prompt template with conversation history, IPC streaming to UI | |
| Week 9 - 10 July 28 - Aug 10 |
Full React chat panel, streaming display, clickable citations, settings panel, follow-up threading, UI accessibility (keyboard nav, ARIA) | |
| Week 11 - 12 Aug 11 - 24 |
Integration tests, performance testing on 5,000+ note collections, user + developer docs, plugin store submission, final report |
5. Deliverables
| Deliverable | Details | |
|---|---|---|
| 1. Published Joplin plugin | Installable from plugin store, all desktop platforms | |
| 2. Local indexing pipeline | LanceDB + Transformers.js, fully on-device, no API key required by default | |
| 3. Hybrid retrieval | Vector + Joplin FTS combined via Reciprocal Rank Fusion | |
| 4. Multi-provider LLM support | Ollama (local), OpenAI, Anthropic - configurable via settings | |
| 5. Streaming chat UI | React panel, real-time token streaming, conversation history, follow-up support | |
| 6. Clickable note citations | Every answer links to source notes - clicking opens the note in Joplin | |
| 7. Incremental indexing | Only re-processes changed notes via updated_time |
|
| 8. Test suite | Unit tests (chunker, retrieval, RRF) + integration tests for full pipeline | |
| 9. Documentation | User guide + developer architecture guide |
6. Availability
-
Weekly hours: 40 hours/week. No internship, no part-time job, no summer courses. GSoC is my only commitment during this period.
-
Time zone: IST (UTC+5:30), India
-
Communication: Available on the Joplin forum daily. Weekly progress reports. Available for live calls with mentor when needed.
-
University exams: End-semester exams finish by late April - well before the June 2 coding start. Zero conflict.