GSoC 2026 Proposal Draft – Idea 4: Chat with Your Note Collection Using AI – Belinda M Kobusingye

Codebmk · 29 March 2026 07:27

Links

GitHub: Codebmk (Belinda Marion Kobusingye) · GitHub

List of Open Source Contributions: github.com/airqo-platform/AirQo-frontend/pulls?q=is:pr+is:closed+author:Codebmk

1. Introduction

I am Belinda Marion Kobusingye, a Computer Science graduate based in Uganda. I hold a Bachelor's degree in Computer Science and recently completed a Master's degree in Computer Science from Makerere University (graduated February 2026).

Over the past four years I have been a core maintainer of AirQo, an open-source air quality monitoring platform developed at Makerere University. In that role I have built key products from the ground up, including the Vertex device management web platform and its Electron desktop wrapper, giving me substantial experience with TypeScript, React, and desktop application architecture.

Beyond systems work, I have explored AI integration through personal projects involving the Anthropic Claude API and vector embeddings, including a document Q&A prototype built with LangChain.js, which maps directly to the RAG pipeline at the core of JoplinAI. I am also an active Joplin user.

2. Project Summary

Problem: Most Joplin users accumulate far more knowledge than they ever leverage. Notes are captured diligently but rarely revisited, synthesised, or acted upon. Tiago Forte's Building a Second Brain methodology describes a knowledge system that actively helps you think and create, not just store. In practice, Joplin users get the capture half right but lack the tools to leverage what they have collected. Keyword search fails for conceptual queries or cross-note synthesis. Some users work around this by exporting notes to Google's NotebookLM, but this requires manual exports, sends data to the cloud, and breaks the connection between AI output and the original Joplin notes. There is no native, privacy-preserving way to have an intelligent conversation with your own Joplin knowledge base.

Why it matters: Joplin is a trusted, privacy-first tool for thousands of users who specifically choose it to keep their data local and under their control. These users deserve the same AI-assisted knowledge interaction that cloud tools like Notion AI and NotebookLM offer, without sacrificing their privacy or leaving the application they already rely on.

What will be implemented: JoplinAI is a Joplin desktop application plugin that adds a conversational AI assistant panel. It uses a Retrieval-Augmented Generation (RAG) pipeline: notes are indexed using local semantic embeddings, and relevant passages are retrieved on each query. Only the retrieved note excerpts, not the full note collection, are sent to the Anthropic Claude API to generate grounded, source-attributed answers.

Every response cites the exact Joplin notes it drew from, and citation links open the referenced note directly in the application. Chat sessions are automatically saved as notes in a dedicated notebook, ensuring conversations are never lost and feed back into the knowledge base.

Expected outcome: A fully functional, open-source Joplin desktop plugin, published to the Joplin plugin repository, that allows users to query their entire note collection in natural language and receive grounded, cited answers, with a privacy-conscious design where note indexing and retrieval happen locally and only relevant excerpts are shared with the Anthropic API.

Out of scope: Mobile support, synchronisation of the vector index across devices, and fine-tuning or hosting of AI models. The summarise-and-save feature and a proactive note review engine are stretch goals if time permits.

3. Technical Approach

Architecture: The plugin is built using Joplin's official plugin framework (TypeScript, React). It consists of four components:

Component	Description
Note Indexer	Reads all notes via the Joplin Data API, splits content into ~400-token overlapping chunks, computes embeddings using transformers.js (all-MiniLM-L6-v2, running fully locally in the plugin webview), and stores vectors in a persistent IndexedDB store. Re-indexes incrementally on note save events.
Semantic Search	On each user query, the same embedding model encodes the question and cosine similarity search selects the top-k most relevant chunks. Results carry note title and notebook path for source attribution.
Claude API Layer	Retrieved chunks are injected into a structured system prompt sent to claude-sonnet-4. The model is instructed to answer only from the provided note context and to cite sources. Responses stream token-by-token into the chat panel. Citation links call `joplin.commands.openNote()` to open the referenced note directly.
Conversation Store	Each session is saved as a Markdown note in an 'AI Conversations' notebook via the Joplin Data API. Notes are picked up by the indexer on the next re-index cycle, enriching future queries with past Q&A context.

Changes to the Joplin codebase: This project is implemented entirely as a plugin and does not require changes to the Joplin core codebase. It uses the published Joplin Plugin API and Data API exclusively.

Libraries and technologies: TypeScript, React, transformers.js (local embeddings), IndexedDB (vector store), Anthropic Claude API (claude-sonnet-4), Joplin Plugin API, Joplin Data API.

Potential challenges:

Performance on large note collections: indexing thousands of notes in the plugin webview may be slow; this will be addressed by batched processing and a visible progress indicator.
transformers.js model load time: the embedding model (~25MB) must be downloaded once and cached locally on first run.
Plugin sandbox restrictions: the plugin webview restricts certain Node.js APIs; all storage will use IndexedDB rather than the filesystem to stay within the webview context.
Claude API latency: streaming responses will be used to ensure the UI feels responsive even before the full answer arrives.

Testing strategy: Unit tests for the chunking and embedding pipeline using Jest. Integration tests for the Joplin Data API interactions using Joplin's plugin test framework. Manual end-to-end testing against a real Joplin note collection of varying sizes.

Documentation plan: A user-facing README covering installation, API key setup, and usage. A developer-facing CONTRIBUTING document explaining the architecture for future contributors. Inline code comments throughout.

4. Implementation Plan

Period	Tasks	Hours
Community Bonding (Currently)	Meet mentors. Study Joplin Plugin API and Data API in depth. Agree on final architecture and data model. Set up development environment, CI, and testing scaffold.	30
Weeks 1 to 3	[Required] Plugin scaffold with settings UI: Claude API key input, retrieval parameter controls. Joplin Data API integration to fetch all notes and notebooks. Chunking logic with configurable overlap. Unit tests for chunking. Initial documentation skeleton.	60
Weeks 4 to 7	[Required] Local embedding pipeline using transformers.js. IndexedDB vector store with cosine similarity search. Incremental re-indexing on note save events. Claude API integration with streaming. End-to-end query returning cited passages. Unit and integration tests for RAG pipeline.	100
Weeks 8 to 9	[Required] React chat panel: streaming response display, message history, source citation links that open the referenced note in Joplin. Conversation persistence: each session auto-saved as a Markdown note in AI Conversations notebook.	70
Weeks 10 to 11	[Required] Error handling, rate-limit back-off, indexing progress indicator. Full user and developer documentation. Integration tests. Plugin submission to Joplin plugin repository. [Optional] Summarise-and-save. [Optional] Proactive review engine.	70
Week 12	Mentor review, bug fixes, final submission preparation.	20

5. Deliverables

Required:

Fully functional JoplinAI desktop plugin published to the Joplin plugin repository.
Local RAG pipeline: note ingestion, chunking, local embedding (transformers.js), IndexedDB vector store, cosine similarity retrieval.
Claude API integration with streaming responses and source-attributed citations that open the referenced note in Joplin.
Conversation persistence: each chat session saved automatically as a Joplin note in a dedicated AI Conversations notebook.
Settings UI: API key management and retrieval parameter configuration.
Unit and integration test suite.
User documentation (README) and developer documentation (CONTRIBUTING, inline comments).

Optional / stretch goals (if time permits):

Summarise-and-save: generate a structured outline or summary from query results and write it back as a new Joplin note.
Proactive review engine: surface notes untouched beyond a configurable threshold and invite the user to update them via dialogue.

6. Availability

Weekly availability: 20 hours per week throughout the GSoC coding period (June 2 to August 25).
Timezone: East Africa Time (EAT, UTC+3).
I have no other internships, examinations, or commitments during the programme that would affect my availability.
I am comfortable with asynchronous communication via GitHub, email, and the Joplin Discord, and will provide weekly written updates to my mentors.

Topic		Replies	Views
GSoC Idea Discussion: Chat with your note collection using AI – architecture and LLM approach Development	5	146	13 March 2026
GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI GSoC	0	19	31 March 2026
What AI feature (if any) could be useful as part of Joplin? Lounge	31	2096	7 August 2025
Integration with local LLM Plugins	4	3148	26 February 2024
GSOC Idea # 1 GSoC	7	146	31 March 2026