GSoC 2026 Proposal Draft – Idea 4: Chat with Your Note Collection Using AI – Gaurav Dhakad

Gauravmy · 25 March 2026 06:03

Title: GSoC 2026 Proposal Draft – Idea 4: Chat with Your Note Collection Using AI – Gaurav Dhakad

Links


Project idea	gsoc/ideas.md at master · joplin/gsoc · GitHub
GitHub	Gauravmy (Gaurav Dhakad) · GitHub
LinkedIn	https://linkedin.com/in/gaurav-dhakad
Joplin PR	#14601 – Prevent duplicate tags when tagging a note
Other contributions	Hyperswitch, llm-app, Pathway (AI pipelines)

1. Introduction

Who I am I'm Gaurav Dhakad, a third-year B.Tech student in Computer Science (AI & ML specialisation) at Sanskriti University, Mathura. I graduate in June 2027.

Technical background My main stack is TypeScript, React, and Python. On the AI/ML side I've worked with TensorFlow, PyTorch, NLP libraries, and vector search systems. I've built and deployed projects including a gesture-controlled healthcare assistant (Python + OpenCV), a facial recognition payment system (TypeScript + React + Firebase), and demand forecasting models for logistics. I've solved 1000+ DSA problems across LeetCode and GeeksforGeeks and been a finalist in 7 national hackathons — SIH 2025, Flipkart GRiD, Walmart Sparkathon among them.

Open source experience

Joplin: PR #14601 — fixed duplicate tag creation when tagging a note with an existing tag
Hyperswitch, llm-app, Pathway: contributions to enterprise AI pipeline and payments infrastructure
Co-founder of AI Digitals: mentoring 300+ students in AI and software development

Why this project I'm applying for Idea 4 because it directly maps to what I've already built — RAG systems, NLP pipelines, and React UIs. This is not a topic I would be learning from scratch during the summer.

2. Project Summary

The problem

Joplin users build up large note collections over time — sometimes thousands of clipped pages, personal notes, and research. The only way to retrieve anything today is keyword search. If you don't remember the exact term, you're stuck. There's no way to ask your notes an open-ended question and get a meaningful answer.

The solution

A Joplin plugin that lets users have a natural-language conversation with their entire note collection. You type a question, the plugin finds the most relevant parts of your notes using vector similarity search, and an LLM generates a grounded answer with citations that link directly back to the source notes. You can ask follow-up questions to refine the answer.

Key properties

Runs fully locally by default — no data leaves your machine
Uses Joplin's own plugin API — no external app or service required
Supports both local LLMs (Ollama) and optional remote APIs (user's own key)
Multi-turn conversation with persistent context

Out of scope Image and attachment content, real-time per-keystroke indexing, mobile UI.

3. Technical Approach

Architecture overview

The system is a four-stage RAG (Retrieval-Augmented Generation) pipeline:

Stage 1 — Indexer Reads all notes through the Joplin Data API. Splits them into overlapping chunks using a markdown-aware chunker (so headings, code blocks, and list items don't get split awkwardly). Generates vector embeddings via Ollama's nomic-embed-text model. Stores embeddings and metadata in LanceDB — an embedded vector store that needs no server and works cross-platform.

Stage 2 — Retriever When a user asks a question, it gets embedded the same way. An approximate nearest-neighbour search returns the top-K most semantically relevant chunks from the index.

Stage 3 — Generator The retrieved chunks and the user's question are sent to an LLM. Ollama running locally is the default. Users can optionally connect OpenAI or Anthropic by adding their own API key in settings. The response is streamed back to the UI as it arrives.

Stage 4 — Chat UI A React panel inside a Joplin plugin sidebar. Shows conversation history, renders the streamed answer in real time, and displays clickable citations that open the referenced note directly inside Joplin.

Joplin integration specifics

Notes fetched via:
joplin.data.get(['notes'], { fields: ['id', 'title', 'body'] })

Citations open notes via:
joplin.commands.execute('openNote', noteId)

Settings panel covers: LLM backend selection, API key input (for remote), and a manual index rebuild button.

Technology choices

Component	Choice	Reason
Plugin framework	Joplin Plugin API (TypeScript)	Required for Joplin plugins
UI	React.js	Already used in Joplin plugins
Vector store	LanceDB	Embedded, no server, cross-platform
Embeddings	Ollama nomic-embed-text	Local, lightweight, good quality
LLM backend	Ollama (default) / OpenAI / Anthropic	Local-first, user chooses
Text chunking	Custom TypeScript chunker	Markdown-aware, overlapping windows

Challenges and how I'll handle them

Challenge	Mitigation
Large vaults slow to index	Batch processing with progress bar; incremental re-index on note update events
Context window limits	Retrieve only top-K chunks; truncate if total tokens exceed limit
Privacy	All processing local by default; remote API only if user explicitly adds a key
First-run embedding model download	Clear setup guide; progress shown in UI

Testing plan

Unit tests (Jest): chunker logic, embedding pipeline, retrieval ranking
Integration tests: mocked Joplin Data API, full RAG pipeline with sample notes
Manual QA: vaults of 100, 1000, and 5000 notes
Performance benchmarks: index build time and query latency on a mid-range machine

4. Implementation Plan

Period	Focus	Tasks
Community bonding (May)	Research & design	Read Joplin plugin API docs, study existing plugins, align with mentor on architecture, write technical design doc
Week 1–2	Foundation	Plugin scaffold (TS + React), note fetcher via Data API, markdown-aware chunker, unit tests
Week 3–4	Indexing	Ollama embedding integration, LanceDB setup, full indexing pipeline, integration tests
Week 5–6	Retrieval & generation	ANN retriever, LLM connection, basic streaming end-to-end
Week 7–8	Chat UI	Message history, streaming display, loading states, error handling
Week 9–10	Polish	Citation links to source notes, settings panel (LLM choice, API key, rebuild)
Week 11	Scale & performance	Incremental re-indexing on note changes, large vault optimisation
Week 12	Documentation & QA	User docs, developer docs, final testing, bug fixes
Final week	Wrap-up	Code cleanup, final evaluation submission, publish to Joplin plugin marketplace

5. Deliverables

Required

Joplin plugin installable from the marketplace
Local vector index of all notes (built and stored on device)
Chat UI panel with multi-turn conversation and streaming responses
Cited answers with clickable links that open source notes in Joplin
Configurable LLM backend (local Ollama + optional remote with user API key)
Incremental index update when notes are added or edited
Unit and integration test suite

Documentation

User guide: installing Ollama, choosing a model, rebuilding the index, usage examples
Developer guide: architecture overview, how the RAG pipeline works, how to add new LLM backends

6. Availability


Weekly hours	40–45 hours/week
Time zone	IST (UTC+5:30)
Exams	Late May, approx. 1 week — mentor will be informed in advance, hours compensated on either side
Other commitments	None — no internship or major obligations during the coding period

Why me specifically

I already have a PR in Joplin (#14601), so I know how the codebase is structured and how the plugin system works. I've built RAG pipelines before (llm-app contributions, AI Demand Forecasting project). I know TypeScript and React from real projects, not tutorials. And as someone who runs an AI community and mentors 300+ people, I'm used to explaining technical decisions clearly — which matters when working with mentors and writing documentation. I'm not proposing this because it looks good on paper. I can build it.

Gauravmy · 25 March 2026 14:48

@laurent Please review my proposal. Draft.

shikuz · 26 March 2026 00:52

Hey @Gauravmy, LanceDB's JavaScript client - have you tested whether it loads in an Electron plugin without native module issues?

If Ollama isn't running when the user opens the plugin, how does embedding work?

Gauravmy · 29 March 2026 04:21

Hi @shikuz , thanks for pointing this out — really helpful questions.

Regarding LanceDB in Electron:
I haven’t fully validated it inside the Joplin plugin (Electron) yet, but I’m currently testing it in a similar environment to check for native module issues. If there are compatibility problems, I plan to switch to running LanceDB in server mode (HTTP) or use a JS-friendly fallback so the plugin stays cross-platform.

For Ollama not running:
On plugin startup, I’ll check if Ollama is available. If it’s not running, the UI will clearly prompt the user to start it or choose another provider. Indexing/embedding won’t proceed silently — it will wait until a valid backend is available.

Also planning a simple setup flow for first-time users (model download + status check), so this case is handled cleanly.

I’ll prioritise validating both of these early during the bonding period.

Topic		Replies	Views
GSoC Idea Discussion: Chat with your note collection using AI – architecture and LLM approach Development	5	146	13 March 2026
GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI GSoC	0	19	31 March 2026
GSoC 2026: Opportunities for the AI projects GSoC	32	695	13 April 2026
Plugin: GPT Plugins	6	3342	15 May 2023
LangChain / LlamaIndex Joplin integrations for developing AI apps Apps	3	1401	7 May 2025