Links
- Project Idea: Chat with your note collection using AI
- GitHub Profile: DevrG03 (Devrajsinh Gohil) · GitHub
- Forum Introduction Post: Intro Post
- Pull Requests to Joplin: PR: #14926 ISSUE: #9637
- Relevant Projects:
- POC Repository: GITHUB
1. Introduction
I am pursuing an M.Tech in Information and Communication Technology at Dhirubhai Ambani University, with a focus on AI systems and software engineering.
During my internship at JSW Steel, I built an AI support assistant for manufacturing operations that reduced troubleshooting time by about 50% by combining query understanding with knowledge-grounded responses.
I have also developed HigherAI and Smart DPR Manager, both end-to-end AI systems involving retrieval, processing, and generation. These directly align with this proposal.
For this GSoC project, I aim to build a privacy-focused conversational interface for Joplin so users can query their own notes in natural language and receive grounded, source-backed answers.
2. Project Summary
Problem
As note collections grow, information retrieval becomes difficult in practice, not just in theory.
For example, a user with hundreds of notes cannot reliably retrieve answers to intent-level questions (e.g., "what patterns appear in my deep work notes?") with keyword search alone, because exact phrase recall is required and context is scattered across many documents.
In my working POC on a realistic 6-note, 66-chunk corpus, the two-stage retrieval pipeline returns relevant context in about 14ms on average, which is fast enough to feel immediate while still being grounded.
The gap this project closes is the difference between a static note archive and a queryable personal knowledge base.
Proposed Solution
A Joplin plugin with a Retrieval-Augmented Generation (RAG) pipeline:
Notes -> Chunking -> Embeddings -> Vector Store
User Query -> Hybrid Retrieval -> RRF Fusion -> Reranking -> LLM -> Response
Expected Outcome
- Chat interface embedded in Joplin (desktop first)
- Grounded answers with citations
- Retrieval diagnostics for transparency
- Local-first defaults (optional cloud model providers)
3. What Is Already Implemented (POC Evidence)
I have already built a production-like POC that validates feasibility:
- Local embeddings using Transformers.js
- LanceDB local vector storage
- Hybrid retrieval: vector + BM25-lite keyword retrieval
- Reciprocal Rank Fusion (RRF)
- Reranking stage
- Grounded answer generation via Ollama
- API endpoints:
/health,/reindex,/retrieve,/chat - Web chat with retrieval diagnostics panel
- Smoke tests and benchmark script
Current POC performance snapshot
- Indexed 66 chunks from 6 realistic long notes in 665ms (local run)
- Retrieval benchmark (average):
- embed stage: ~12.2ms
- retrieval stage: ~1.8ms
- end-to-end retrieval path: ~14.0ms
- Live chat samples (debug timings):
embedQuery: 7–23msretrieve: 32–48msanswer(Ollama): ~1241–1603ms
This demonstrates that the core architecture is not speculative; it already works end-to-end.
4. Technical Approach
4.1 Integration with Joplin Architecture
Target: app-desktop plugin architecture.
Architecture decision: this project uses a managed local sidecar service (Option B), not an in-process monolith.
The plugin spawns the local RAG service as a child process at startup, performs health checks, monitors process state, restarts it on failure, and terminates it on plugin unload.
The plugin communicates with the service via localhost HTTP (/reindex, /retrieve, /chat).
This follows the same isolation principle as language-server style architectures: Transformers.js serialises inference sessions — indexing and a live query cannot run concurrently within a single process. LanceDB concurrent writes cause commit conflicts requiring manual retry, documented to crash production instances without explicit conflict-handling logic. The sidecar gives each workload its own process: no session conflicts, one index owner, clean recovery.
Joplin-specific integration points:
joplin.data.get/post/put/deletefor note access and updatejoplin.data.userDataGet/userDataSetfor per-note indexing metadatajoplin.workspace.onNoteChangefor incremental indexingjoplin.workspace.onSyncCompletefor sync-triggered reconciliationjoplin.views.panelsfor chat + diagnostics UIjoplin.commands.registerfor commands (Ask Notes, Reindex, Open Source)joplin.settingsfor configurable model/retrieval options
No Joplin core modification is required.
4.2 Design Principles
- Local-first by default
- Privacy-preserving processing
- Modular architecture (replaceable retrieval/LLM components)
- Explainable retrieval (diagnostics + citations)
- Incremental indexing for scalability
4.3 Retrieval Architecture (Core Technical Decision)
Stage 1: Hybrid Retrieval
- Dense vector retrieval for semantic matching
- Keyword retrieval for exact terms and entities
- Merge with Reciprocal Rank Fusion (RRF)
Stage 2: Reranking
- Rerank fused candidates for final relevance (already implemented in the POC)
- Baseline reranker: local query-chunk vector similarity on fused shortlist
- Optional enhancement: cross-encoder reranker as a post-GSoC extension if quality gains justify latency cost
Final Selection
- Top reranked chunks are passed to the LLM with strict grounding rules
- Answer includes source references
4.4 Embeddings
- Transformers.js local models (MiniLM/BGE-small family)
- Node.js runtime compatibility
- Offline support with no external embedding API dependency
4.5 Vector Storage
Primary: LanceDB (embedded/local).
Why LanceDB over alternatives:
- vs Vectra: Vectra uses brute-force cosine similarity over a flat JSON file - no indexing algorithm, no hybrid search, does not scale beyond a few thousand chunks. LanceDB uses IVF-PQ indexed search, supports hybrid querying, and handles large corpora efficiently.
- vs sqlite-vec: SQLite-vec integrates well with relational data but is not optimised for vector workloads - no native hybrid retrieval, no columnar storage format. LanceDB is purpose-built for vector-first pipelines with better query flexibility and throughput.
- Plugin suitability: LanceDB runs entirely in-process, requires no additional services, and is validated in the POC at 14ms retrieval latency.
Storage abstraction will be kept to allow fallback backends if needed.
4.6 LLM Integration
- Default: local Ollama models
- Optional: OpenAI/Gemini providers via user settings
- Graceful fallback behavior for missing model/service
4.7 UI/UX
- Panel-based UI in Joplin
- Multi-turn chat
- Clickable citations to open note context
- Retrieval diagnostics toggle for power users
4.8 Indexing Strategy
- Initial full indexing
- Incremental indexing on note changes
- Metadata-backed change detection (hash + timestamp + version)
4.9 Reliability & Safety
- Grounded prompt constraints
- Explicit abstention when context is insufficient
- Stage-level latency logging
- Automated smoke tests + benchmark checks
4.10 Testing Strategy
Testing is structured across four distinct layers that separate component correctness, pipeline connectivity, retrieval quality, and performance regression.
Unit tests cover the deterministic behaviour of individual pipeline components:
- Chunking: same note input always produces identical chunk IDs and boundaries, regardless of run order
- RRF fusion: given two ranked lists with known positions, fused scores match the formula exactly
- BM25 scoring: given a fixed corpus and query, term frequency and IDF calculations produce expected values
These run fully offline and are suitable for CI on every commit.
Integration tests use a fixed 5-note corpus with 10 known-answer queries where the correct source note is established in advance. Each query asserts the correct note appears in the top-3 retrieved chunks. This gives a regression baseline — any code change that degrades retrieval is caught before it reaches the plugin.
Retrieval quality evaluation uses a golden set of 15 query/answer pairs, measured across three pipeline configurations:
| Configuration | Metrics |
|---|---|
| Vector-only baseline | Precision@3, MRR |
| Hybrid + RRF | Precision@3, MRR |
| Hybrid + RRF + Reranking | Precision@3, MRR |
This directly validates that each stage of the pipeline produces a measurable improvement over the previous. The Week 6 and Week 7 success criteria ("improved retrieval relevance" and "better top-3 precision") are defined against these numbers, not subjectively. The POC benchmark script already provides the latency baseline; this evaluation set provides the quality baseline.
Latency benchmarks are enforced as pass/fail thresholds, not just informational:
- Embed stage: under 30ms on warm model
- Retrieval: under 500ms on a 10,000-chunk corpus via LanceDB indexed search
- End-to-end retrieval path: under 200ms excluding LLM generation
These run as part of the existing benchmark script alongside smoke tests. A regression past any threshold causes the script to exit non-zero, blocking the PR.
5. Architecture Diagrams
5.1 Current POC
5.2 Joplin Integration
5.3 Sequence Diagram
6. Implementation Plan (12 Weeks)
Community Bonding
- Finalize design with mentors
- Confirm plugin API usage boundaries
- Lock evaluation dataset and baseline metrics
Week 1
- Plugin scaffold and dev workflow
- Register commands and panel shell
Success criteria:
- Plugin loads reliably in Joplin dev profile
Week 2
- Note ingestion via
joplin.data - Initial full indexing command
Success criteria:
- End-to-end ingest + index for full notebook selection
Week 3
- Chunking improvements (heading-aware + fallback)
- Metadata schema for citations and updates
Success criteria:
- Deterministic chunk IDs and traceable source mapping
Week 4
- Embeddings pipeline integration in plugin flow
- Batch indexing optimization
Success criteria:
- Stable indexing with realistic personal corpus
Week 5
- LanceDB integration in plugin runtime
- Retrieval API stabilization
Success criteria:
- Retrieval latency target for local queries
Week 6
- Hybrid retrieval + RRF in plugin pipeline
- Baseline quality comparison against vector-only retrieval
Success criteria:
- Improved retrieval relevance on evaluation set
Week 7
- Reranking stage integration + tuning
- Latency/quality tradeoff controls
Success criteria:
- Better top-3 precision with acceptable latency budget
Midterm Evaluation
- Working retrieval + answer path in plugin panel
- Demo with grounded citations
Success criteria:
- Plugin demo supports end-to-end ask flow with citations from indexed notes
- Retrieval diagnostics are available for mentor review
Week 8
- LLM provider abstraction (Ollama default, optional cloud)
- Grounded prompt + abstention behavior
Success criteria:
- User can switch provider configuration through settings
- System abstains clearly when context is insufficient
Week 9
- Citation UX: clickable source jumps to notes
- Streaming or progressive response rendering (if feasible)
Success criteria:
- Clicking a citation opens the corresponding note context reliably
- Response rendering is incremental or clearly non-blocking for longer generations
Week 10
- Incremental indexing using note/sync events
- Debounce and queue strategy
Success criteria:
- Changed notes are reindexed without full-corpus rebuild
- Frequent edits do not trigger reindex storms
Week 11
- Hardening, edge cases, cross-platform checks (desktop targets)
- Test expansion and benchmark reporting
Success criteria:
- Smoke and integration checks pass on representative desktop environments
- Benchmark report includes quality and latency comparison against baseline
Week 12
- Documentation and final cleanup
- Final evaluation report and deliverables
Success criteria:
- User and developer docs are complete and reproducible
- Final deliverables are packaged with demo instructions and evaluation summary
7. Deliverables
- Joplin desktop plugin with AI chat panel
- Hybrid + RRF + reranking retrieval pipeline
- Local-first model support with optional cloud providers
- Source-grounded answer generation with citations
- Incremental indexing support
- Test suite, benchmark scripts, and documentation
8. Risks and Mitigation
| Risk | Mitigation |
|---|---|
| Plugin runtime constraints | Keep thin adapter; isolate heavy logic in service/module |
| Latency spikes on large corpora | Use POC baseline (14ms retrieval on 66 chunks), then enforce targets with profiling: retrieval < 500ms on 10,000 chunks via indexed search, bounded top-k, caching, and prompt clipping |
| Model setup friction | Guided checks, clear fallback messaging, optional providers |
| Retrieval quality drift | Maintain reproducible eval set and track retrieval Precision@3 and MRR against vector-only baseline with diagnostics snapshots |
9. Availability
- Weekly availability: 30 hours
- Time zone: IST
- Commitments: Minor academic workload
- Communication: Weekly progress updates and active mentor collaboration
10. References
- Joplin architecture: Joplin architecture | Joplin
- Plugin system architecture: Plugin system architecture | Joplin
- Plugin development guide: Getting started with plugin development | Joplin
- Joplin Plugin API root: joplin | Joplin Plugin API Documentation
- joplin.data API: joplin.data | Joplin Plugin API Documentation
- joplin.workspace API: joplin.workspace | Joplin Plugin API Documentation
- joplin.views API: joplin.views | Joplin Plugin API Documentation
Dear mentors, your feedback will be precious for me. ![]()


