GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Rebecca Ayodele
Links
-
Idea Link:
-
GitHub:
-
Forum introduction: https://discourse.joplinapp.org/t/welcome-to-gsoc-2026-with-joplin/48974/85
-
Pull requests: https://github.com/laurent22/joplin/pull/14883
-
Other relevant experience:
1. Introduction
I am a third-year Computer Science and Mathematics student at Obafemi Awolowo University, Nigeria. I have about 4 years of experience in frontend development working with React, TypeScript, JavaScript and Next.js. Recently I have been expanding into AI and machine learning. I previously built a career AI application using Next.js, TypeScript and the OpenRouter API, integrating multiple LLMs including Nemotron, Mistral and Gemini with practical decisions around token limits and cost. This is my first time contributing to open source and I am starting with Joplin.
2. Project Summary
What problem it solves
-
A user has hundreds of notes but cannot remember the exact title or keywords used to find them through regular search
-
A user wants to deeply understand a topic from their notebook
-
Related topics are spread across different notebooks and the user wants them all in one place to gain insights
Although a similar idea exists in the Jarvis plugin, it has key limitations. Jarvis is a third-party plugin that is not officially maintained by Joplin, meaning it can break when Joplin updates. It is also complex to set up, stores embeddings inside each note's hidden metadata which adds unnecessary data to every note, and does not work well on mobile. Idea 4 would be an official, simpler implementation designed for both desktop and mobile from the start.
What will be implemented
-
Core chat Interface
Users access the chat through an AI button that appears when they tap the add icon. Clicking it opens a chat panel with an input field that accepts text. When a user asks a question, the AI retrieves relevant notes using RAG and generates an answer. If the answer draws from multiple notes, each part of the response is labelled with its source — for example Note 1, Note 2 — similar to how NotebookLM handles citations. Clicking a citation label shows a "Go to note" option that takes the user directly to that note, and pressing back returns them to the chat. A resources panel on the side of the chat lists all notes referenced in the conversation, giving users another way to navigate to their sources. The chat history is also saved so users can return to previous conversations.
-
Offline Support
The plugin works fully without internet by using a local LLM through Ollama, meaning the user's note data never leaves their device. This aligns with Joplin's core philosophy of being an offline-first application and reduces security and privacy risk. When Ollama is not installed, the plugin falls back gracefully to a cloud LLM through OpenRouter.
Expected Outcome
By the end of the project, Joplin users will be able to open a dedicated AI chat panel and have natural conversations with their notes. Questions will return answers with numbered citations linking directly to the source notes, similar to how NotebookLM works. Users will be able to navigate to any referenced note and return to the chat seamlessly. A resources panel will list all notes referenced in the conversation. Chat history will be saved for future reference. The plugin will work fully offline using Ollama, keeping all note data on the user's device, with a fallback to cloud LLM via OpenRouter when needed.
Desktop UX screenshot
**
- Technical Approach**
**
RAG Pipeline**
Chunking
When a note is processed, it cannot be sent to the embedding model as a whole. Sending an entire note at once would dilute its meaning — the resulting vector would represent an average of everything in the note, making it harder to retrieve specific information accurately. Embedding models also have token limits, so long notes must be split into smaller pieces called chunks.
Notes are split at heading boundaries first, since content under the same heading is usually related. If a section has no headings, the plugin splits by paragraph and semantic breaks — points where the topic shifts. If a chunk is still too long after that, a sliding window with overlap is used to ensure context is preserved across boundaries. Each chunk is stored alongside its metadata: the note title, heading path, line number, and note ID. This metadata is what powers the citation system later.
Embedding
Once a note is chunked, each chunk is sent to the embedding model one by one or in parallel batches for speed. The model converts the text into a vector — a list of numbers that represents the meaning of that chunk. These vectors are what get stored and searched later.
The embedding model used is BGE-small-en-v1.5, running locally via Transformers.js. This model runs entirely in JavaScript with no Python or internet connection required, making it suitable for Joplin's offline-first philosophy. It is small enough to download once and fast enough to run on a typical user's device while still producing high quality embeddings for retrieval tasks.
To ensure model consistency, BGE-small is used for all embeddings — both for indexing notes and for converting the user's query at search time. Since the same model is always used, the vectors always exist in the same space and comparisons are always meaningful. OpenRouter is only used for LLM generation, not for embeddings, so switching between online and offline modes never creates a model mismatch.
Storage
Embeddings are stored in a local SQLite database using the sqlite-vec extension. SQLite was chosen over other options because it is fast, simple, and well suited for local desktop applications. Storing embeddings in a JSON file would be too slow for large collections, and storing them in Joplin's note userData — as the Jarvis plugin does — adds hidden data to every note and increases complexity. SQLite keeps the index in a dedicated file that is separate from the notes themselves.
Since SQLite involves disk access, all vectors are loaded into RAM on the first search. All subsequent searches happen entirely in memory, making them fast. Chat history and conversation records are also stored in SQLite alongside the embeddings.
Because SQLite does not sync automatically across devices, when a user opens Joplin on a new device the plugin detects there is no index and rebuilds it automatically in the background, showing a progress indicator so the user knows what is happening.
Search
When a user types a question, the question text is converted into an embedding using the same BGE-small model used to index the notes. The result is a single vector representing the meaning of the question, which is then quantized to Q8 to match the format of the cached note vectors and make comparisons faster on CPU.
The search runs as a brute force scan against all note chunk vectors in RAM. Rather than using an approximate index like FAISS, brute force is preferred because for the typical size of a Joplin note collection it is already fast enough at 10 to 50 milliseconds, it guarantees 100% recall meaning no relevant note is ever missed, and it is simpler to implement and maintain. FAISS is designed for collections of millions of vectors and would add unnecessary complexity for a personal note application. Similarly, LangChain was not chosen because it adds heavy dependencies and abstracts away the specific controls needed for custom chunking, citation metadata and token budget management — a direct implementation gives more flexibility and control.
Similarity between vectors is measured using cosine similarity. If two vectors point in the same direction the score is 1.0 meaning identical meaning, if they are perpendicular the score is 0 meaning no relation, and if they point in opposite directions the score is -1.0.
During the scan a top-K heap keeps track of only the highest scoring chunks. As each chunk is scored, if it scores higher than the lowest result currently in the heap it replaces it. This means only K results are held in memory at once. Since results are individual chunks and not whole notes, they are grouped by note and their scores aggregated to produce a final ranking. Each result carries the note title, heading path, line number and note ID — this is what enables the citation system.
Prompt Construction and Token Budget
Before sending to the LLM, a token budget is calculated. The LLM has a maximum context window and can only process a certain number of tokens at once. Tokens are reserved for the system message, the user's question, conversation history from previous turns, and the LLM's response. Whatever tokens remain are used for the retrieved note chunks. If the chunks exceed the available budget, as many as fit are included and the rest are truncated.
The final prompt contains four things: a role instruction telling the LLM it is a note assistant, a grounding instruction telling it to only answer from the provided notes and acknowledge when information is not available, the retrieved note chunks as context, and the user's actual question. Grounding is important because it prevents the LLM from generating answers from its training data rather than the user's actual notes.
Generation and Citations
The prompt is sent to either OpenRouter for cloud generation or Ollama for local offline generation. The LLM generates the answer token by token so the user sees the response appearing gradually rather than waiting for the full answer. After the response, numbered citations show the user exactly which notes the answer came from. Clicking a citation shows a "Go to note" option that navigates directly to the source note.
When no relevant notes are found for a query, the plugin does not return an empty response. Instead it shows the user a message explaining that nothing was found in their notes and offers the option to answer from general knowledge instead. This gives the user control over whether they want the LLM to draw from its training data.
Architecture and Components
Frontend (UI) components:
-
ChatPanel — the main chat interface. Users can type @ to mention a specific note and focus the conversation on that note only
-
MessageInput — text input with @ mention support
-
SourcesPanel — slide-in panel listing all notes referenced in the conversation
-
HistoryPanel — previous conversations with their associated sources
-
SettingsPage — API key configuration and index controls
-
IndexStatus — displays indexing progress on first run or new device
Backend (logic) components:
-
embeddings.ts— converts text to vectors using Transformers.js -
indexer.ts— handles chunking notes and building and updating the index -
search.ts— brute force search and top-K heap -
llm.ts— sends prompts to OpenRouter or Ollama -
database.ts— SQLite storage and RAM cache management -
promptBuilder.ts— constructs the prompt with token budget management
Integration with the Joplin Codebase
Since this is a plugin it does not modify Joplin's core codebase. Instead it integrates through the Joplin Plugin API:
-
joplin.data.get(['notes'])to read note content -
joplin.views.panelsto create the chat panel -
joplin.commands.execute('openNote', noteId)to navigate to a note -
joplin.workspace.onNoteChangeto detect when notes are edited and trigger re-indexing
Libraries and Technologies
-
TypeScript — main language for the plugin, consistent with Joplin's codebase
-
React — building the chat UI components, which I have 4 years of experience with
-
Transformers.js (Hugging Face) — running BGE-small-en-v1.5 locally in JavaScript for offline embeddings. Chosen because it runs ONNX models entirely in JavaScript with no Python needed, which is critical for a Joplin plugin environment
-
SQLite with sqlite-vec — storing embeddings and chat history locally. Chosen over JSON for speed and over userData approach for simplicity and cleanliness
-
OpenRouter API — cloud LLM generation with access to multiple models through a single API key. I have production experience with this from my career AI app where I integrated Nemotron, Mistral and Gemini
-
Ollama — local LLM for fully offline generation, aligning with Joplin's offline-first philosophy
-
Joplin Plugin API — reading notes, creating panels, navigating to notes
Potential Challenges
Challenge 1 — First time indexing on a new device Users with large note collections may experience a slow initial indexing process when opening Joplin on a new device. The plugin handles this by running indexing in the background while showing a progress indicator, allowing the user to continue using Joplin normally.
Challenge 2 — Token budget management The LLM can only process a limited number of tokens at once. If the most relevant chunks exceed the available context window, some information will be truncated. The plugin manages this by calculating the available token budget after reserving space for the system message, user question, conversation history and LLM response, then fitting as many relevant chunks as possible within the remaining space.
Challenge 3 — Keeping the index fresh When users frequently edit notes, the index needs to stay up to date without impacting performance. The plugin handles this by using a content hash to detect which notes have actually changed, so only modified notes are re-embedded. Re-indexing is also debounced by 5 seconds after a note is saved, meaning multiple quick edits only trigger one re-embedding call.
Challenge 4 — Model consistency The embedding model used for notes and queries must always match, otherwise vectors exist in incompatible spaces. This is handled by using BGE-small via Transformers.js for all embeddings at all times — both when indexing notes and when processing queries. Since the model never changes, consistency is always guaranteed. OpenRouter is only used for LLM generation, not for embeddings.
4. Implementation Plan
Week 1: Familiarisation and Setup Study the Joplin Plugin API documentation and existing plugin examples. Explore the codebase to understand how plugins interact with notes, panels and commands. Set up the plugin scaffold and development environment. Familiarise myself with Transformers.js and sqlite-vec by running small experiments locally.
Week 2: UI Components Build all UI components — ChatPanel, MessageInput with @ mention, SourcesPanel, HistoryPanel, SettingsPage and IndexStatus — without backend functionality. UI will be static at this stage but fully designed and navigable on both desktop and mobile layouts.
Week 3: Chunking Implement note reading using joplin.data.get(['notes']). Build the chunking system that splits notes at heading boundaries first, then by paragraph and semantic breaks, then by sliding window with overlap when needed. Store metadata alongside each chunk — note title, heading path, line number and note ID.
Week 4: Embedding Integrate Transformers.js and BGE-small-en-v1.5 for generating embeddings from chunks. Send chunks to the model one by one or in parallel batches. Verify that the same model is used consistently for both notes and queries.
Week 5: Storage Set up SQLite with sqlite-vec for storing embeddings and metadata. Implement the RAM cache that loads all vectors into memory on first search. Store chat history and conversation records in SQLite alongside embeddings.
Week 6: Indexing Pipeline Add content hash detection so only changed notes are re-embedded. Implement debounced re-indexing triggered 5 seconds after a note is saved. Add background indexing on first run or new device with a progress indicator.
Week 7: Search Implement Q8 quantization for vectors. Build brute force cosine similarity search against the RAM cache. Implement top-K heap to track only the highest scoring chunks during the scan. Add result aggregation by note and metadata extraction for citations.
Week 8: Prompt Builder and Token Budget Build the prompt builder that constructs the final prompt with role instruction, grounding instruction, retrieved note chunks as context and the user question. Implement token budget management — calculate available tokens after reserving space for system message, question, conversation history and LLM response.
Week 9: LLM Integration Integrate OpenRouter API for cloud generation with streaming response so answers appear token by token. Integrate Ollama for fully offline generation. Add the no notes found fallback that offers the user the option to answer from general knowledge.
Week 10: Connecting UI to Backend Connect all backend logic to the UI components. Citations become clickable with Go to note navigation. SourcesPanel populates from real search results. HistoryPanel saves and loads real conversations from SQLite. @ mention triggers real note search and focuses the conversation on the selected note. IndexStatus shows real indexing progress.
Week 11: Testing Write tests for core pipeline components — chunking, embedding, search and prompt builder. Fix bugs discovered during integration. Ensure mobile and desktop layouts work correctly across platforms.
Week 12: Polish and Documentation Final bug fixes and performance improvements. Write documentation covering setup, configuration and usage. Final review and submission.
5. Deliverables
-
A fully functional Joplin plugin implementing AI chat with notes using RAG
-
Core chat interface with numbered citations and Go to note navigation
-
Offline support via Ollama with fallback to OpenRouter for cloud generation
-
@ mention support for focusing the chat on specific notes
-
Sources panel showing all notes referenced in a conversation
-
Chat history saved and accessible across sessions
-
Background indexing with progress indicator on first run and new devices
-
No notes found fallback with option to answer from general knowledge
-
Works on both desktop and mobile
-
Unit tests for core pipeline components — chunking, embedding, search and prompt builder
-
Documentation covering setup, configuration and usage
6. Availability
I am available for approximately 30 to 35 hours per week during GSoC. I am based in Nigeria (WAT, UTC+1). During weekdays I can commit more time, around 8 to 10 hours per day when possible, and around 5 hours on weekends. I have university examinations that are expected to fall sometime in June or July. I do not have the exact dates yet but I will communicate with my mentors as soon as the schedule is confirmed so we can plan around it. Outside of the exam period I have no other commitments that would affect my availability.
AI Assistance Disclosure
AI was used to improve the grammar and structure of this proposal. All ideas, design decisions, technical reasoning, and the UI wireframe were developed by me.


