GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Rebecca Ayodele

Beccaa · 23 March 2026 22:53

GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Rebecca Ayodele

Links

Idea Link:
GitHub:
Forum introduction: https://discourse.joplinapp.org/t/welcome-to-gsoc-2026-with-joplin/48974/85
Pull requests: https://github.com/laurent22/joplin/pull/14883
Other relevant experience:

1. Introduction

I am a third-year Computer Science and Mathematics student at Obafemi Awolowo University, Nigeria. I have about 4 years of experience in frontend development working with React, TypeScript, JavaScript and Next.js. Recently I have been expanding into AI and machine learning. I previously built a career AI application using Next.js, TypeScript and the OpenRouter API, integrating multiple LLMs including Nemotron, Mistral and Gemini with practical decisions around token limits and cost. This is my first time contributing to open source and I am starting with Joplin.

2. Project Summary

What problem it solves

A user has hundreds of notes but cannot remember the exact title or keywords used to find them through regular search
A user wants to deeply understand a topic from their notebook
Related topics are spread across different notebooks and the user wants them all in one place to gain insights

Although a similar idea exists in the Jarvis plugin, it has key limitations. Jarvis is a third-party plugin that is not officially maintained by Joplin, meaning it can break when Joplin updates. It is also complex to set up, stores embeddings inside each note's hidden metadata which adds unnecessary data to every note, and does not work well on mobile. Idea 4 would be an official, simpler implementation designed for both desktop and mobile from the start.

What will be implemented

Core chat Interface

Users access the chat through an AI button that appears when they tap the add icon. Clicking it opens a chat panel with an input field that accepts text. When a user asks a question, the AI retrieves relevant notes using RAG and generates an answer. If the answer draws from multiple notes, each part of the response is labelled with its source — for example Note 1, Note 2 — similar to how NotebookLM handles citations. Clicking a citation label shows a "Go to note" option that takes the user directly to that note, and pressing back returns them to the chat. A resources panel on the side of the chat lists all notes referenced in the conversation, giving users another way to navigate to their sources. The chat history is also saved so users can return to previous conversations.
Offline Support

The plugin works fully without internet by using a local LLM through Ollama, meaning the user's note data never leaves their device. This aligns with Joplin's core philosophy of being an offline-first application and reduces security and privacy risk. When Ollama is not installed, the plugin falls back gracefully to a cloud LLM through OpenRouter.

Expected Outcome

By the end of the project, Joplin users will be able to open a dedicated AI chat panel and have natural conversations with their notes. Questions will return answers with numbered citations linking directly to the source notes, similar to how NotebookLM works. Users will be able to navigate to any referenced note and return to the chat seamlessly. A resources panel will list all notes referenced in the conversation. Chat history will be saved for future reference. The plugin will work fully offline using Ollama, keeping all note data on the user's device, with a fallback to cloud LLM via OpenRouter when needed.

Desktop UX screenshot

**

Technical Approach**

**

RAG Pipeline**

Chunking

When a note is processed, it cannot be sent to the embedding model as a whole. Sending an entire note at once would dilute its meaning — the resulting vector would represent an average of everything in the note, making it harder to retrieve specific information accurately. Embedding models also have token limits, so long notes must be split into smaller pieces called chunks.

Notes are split at heading boundaries first, since content under the same heading is usually related. If a section has no headings, the plugin splits by paragraph and semantic breaks — points where the topic shifts. If a chunk is still too long after that, a sliding window with overlap is used to ensure context is preserved across boundaries. Each chunk is stored alongside its metadata: the note title, heading path, line number, and note ID. This metadata is what powers the citation system later.

Embedding

Once a note is chunked, each chunk is sent to the embedding model one by one or in parallel batches for speed. The model converts the text into a vector — a list of numbers that represents the meaning of that chunk. These vectors are what get stored and searched later.

The embedding model used is BGE-small-en-v1.5, running locally via Transformers.js. This model runs entirely in JavaScript with no Python or internet connection required, making it suitable for Joplin's offline-first philosophy. It is small enough to download once and fast enough to run on a typical user's device while still producing high quality embeddings for retrieval tasks.

To ensure model consistency, BGE-small is used for all embeddings — both for indexing notes and for converting the user's query at search time. Since the same model is always used, the vectors always exist in the same space and comparisons are always meaningful. OpenRouter is only used for LLM generation, not for embeddings, so switching between online and offline modes never creates a model mismatch.

Storage

Embeddings are stored in a local SQLite database using the sqlite-vec extension. SQLite was chosen over other options because it is fast, simple, and well suited for local desktop applications. Storing embeddings in a JSON file would be too slow for large collections, and storing them in Joplin's note userData — as the Jarvis plugin does — adds hidden data to every note and increases complexity. SQLite keeps the index in a dedicated file that is separate from the notes themselves.

Since SQLite involves disk access, all vectors are loaded into RAM on the first search. All subsequent searches happen entirely in memory, making them fast. Chat history and conversation records are also stored in SQLite alongside the embeddings.

Because SQLite does not sync automatically across devices, when a user opens Joplin on a new device the plugin detects there is no index and rebuilds it automatically in the background, showing a progress indicator so the user knows what is happening.

Search

When a user types a question, the question text is converted into an embedding using the same BGE-small model used to index the notes. The result is a single vector representing the meaning of the question, which is then quantized to Q8 to match the format of the cached note vectors and make comparisons faster on CPU.

The search runs as a brute force scan against all note chunk vectors in RAM. Rather than using an approximate index like FAISS, brute force is preferred because for the typical size of a Joplin note collection it is already fast enough at 10 to 50 milliseconds, it guarantees 100% recall meaning no relevant note is ever missed, and it is simpler to implement and maintain. FAISS is designed for collections of millions of vectors and would add unnecessary complexity for a personal note application. Similarly, LangChain was not chosen because it adds heavy dependencies and abstracts away the specific controls needed for custom chunking, citation metadata and token budget management — a direct implementation gives more flexibility and control.

Similarity between vectors is measured using cosine similarity. If two vectors point in the same direction the score is 1.0 meaning identical meaning, if they are perpendicular the score is 0 meaning no relation, and if they point in opposite directions the score is -1.0.

During the scan a top-K heap keeps track of only the highest scoring chunks. As each chunk is scored, if it scores higher than the lowest result currently in the heap it replaces it. This means only K results are held in memory at once. Since results are individual chunks and not whole notes, they are grouped by note and their scores aggregated to produce a final ranking. Each result carries the note title, heading path, line number and note ID — this is what enables the citation system.

Prompt Construction and Token Budget

Before sending to the LLM, a token budget is calculated. The LLM has a maximum context window and can only process a certain number of tokens at once. Tokens are reserved for the system message, the user's question, conversation history from previous turns, and the LLM's response. Whatever tokens remain are used for the retrieved note chunks. If the chunks exceed the available budget, as many as fit are included and the rest are truncated.

The final prompt contains four things: a role instruction telling the LLM it is a note assistant, a grounding instruction telling it to only answer from the provided notes and acknowledge when information is not available, the retrieved note chunks as context, and the user's actual question. Grounding is important because it prevents the LLM from generating answers from its training data rather than the user's actual notes.

Generation and Citations

The prompt is sent to either OpenRouter for cloud generation or Ollama for local offline generation. The LLM generates the answer token by token so the user sees the response appearing gradually rather than waiting for the full answer. After the response, numbered citations show the user exactly which notes the answer came from. Clicking a citation shows a "Go to note" option that navigates directly to the source note.

When no relevant notes are found for a query, the plugin does not return an empty response. Instead it shows the user a message explaining that nothing was found in their notes and offers the option to answer from general knowledge instead. This gives the user control over whether they want the LLM to draw from its training data.

Architecture and Components

Frontend (UI) components:

ChatPanel — the main chat interface. Users can type @ to mention a specific note and focus the conversation on that note only
MessageInput — text input with @ mention support
SourcesPanel — slide-in panel listing all notes referenced in the conversation
HistoryPanel — previous conversations with their associated sources
SettingsPage — API key configuration and index controls
IndexStatus — displays indexing progress on first run or new device

Backend (logic) components:

embeddings.ts — converts text to vectors using Transformers.js
indexer.ts — handles chunking notes and building and updating the index
search.ts — brute force search and top-K heap
llm.ts — sends prompts to OpenRouter or Ollama
database.ts — SQLite storage and RAM cache management
promptBuilder.ts — constructs the prompt with token budget management

Integration with the Joplin Codebase

Since this is a plugin it does not modify Joplin's core codebase. Instead it integrates through the Joplin Plugin API:

joplin.data.get(['notes']) to read note content
joplin.views.panels to create the chat panel
joplin.commands.execute('openNote', noteId) to navigate to a note
joplin.workspace.onNoteChange to detect when notes are edited and trigger re-indexing

Libraries and Technologies

TypeScript — main language for the plugin, consistent with Joplin's codebase
React — building the chat UI components, which I have 4 years of experience with
Transformers.js (Hugging Face) — running BGE-small-en-v1.5 locally in JavaScript for offline embeddings. Chosen because it runs ONNX models entirely in JavaScript with no Python needed, which is critical for a Joplin plugin environment
SQLite with sqlite-vec — storing embeddings and chat history locally. Chosen over JSON for speed and over userData approach for simplicity and cleanliness
OpenRouter API — cloud LLM generation with access to multiple models through a single API key. I have production experience with this from my career AI app where I integrated Nemotron, Mistral and Gemini
Ollama — local LLM for fully offline generation, aligning with Joplin's offline-first philosophy
Joplin Plugin API — reading notes, creating panels, navigating to notes

Potential Challenges

Challenge 1 — First time indexing on a new device Users with large note collections may experience a slow initial indexing process when opening Joplin on a new device. The plugin handles this by running indexing in the background while showing a progress indicator, allowing the user to continue using Joplin normally.

Challenge 2 — Token budget management The LLM can only process a limited number of tokens at once. If the most relevant chunks exceed the available context window, some information will be truncated. The plugin manages this by calculating the available token budget after reserving space for the system message, user question, conversation history and LLM response, then fitting as many relevant chunks as possible within the remaining space.

Challenge 3 — Keeping the index fresh When users frequently edit notes, the index needs to stay up to date without impacting performance. The plugin handles this by using a content hash to detect which notes have actually changed, so only modified notes are re-embedded. Re-indexing is also debounced by 5 seconds after a note is saved, meaning multiple quick edits only trigger one re-embedding call.

Challenge 4 — Model consistency The embedding model used for notes and queries must always match, otherwise vectors exist in incompatible spaces. This is handled by using BGE-small via Transformers.js for all embeddings at all times — both when indexing notes and when processing queries. Since the model never changes, consistency is always guaranteed. OpenRouter is only used for LLM generation, not for embeddings.

4. Implementation Plan

Week 1: Familiarisation and Setup Study the Joplin Plugin API documentation and existing plugin examples. Explore the codebase to understand how plugins interact with notes, panels and commands. Set up the plugin scaffold and development environment. Familiarise myself with Transformers.js and sqlite-vec by running small experiments locally.

Week 2: UI Components Build all UI components — ChatPanel, MessageInput with @ mention, SourcesPanel, HistoryPanel, SettingsPage and IndexStatus — without backend functionality. UI will be static at this stage but fully designed and navigable on both desktop and mobile layouts.

Week 3: Chunking Implement note reading using joplin.data.get(['notes']). Build the chunking system that splits notes at heading boundaries first, then by paragraph and semantic breaks, then by sliding window with overlap when needed. Store metadata alongside each chunk — note title, heading path, line number and note ID.

Week 4: Embedding Integrate Transformers.js and BGE-small-en-v1.5 for generating embeddings from chunks. Send chunks to the model one by one or in parallel batches. Verify that the same model is used consistently for both notes and queries.

Week 5: Storage Set up SQLite with sqlite-vec for storing embeddings and metadata. Implement the RAM cache that loads all vectors into memory on first search. Store chat history and conversation records in SQLite alongside embeddings.

Week 6: Indexing Pipeline Add content hash detection so only changed notes are re-embedded. Implement debounced re-indexing triggered 5 seconds after a note is saved. Add background indexing on first run or new device with a progress indicator.

Week 7: Search Implement Q8 quantization for vectors. Build brute force cosine similarity search against the RAM cache. Implement top-K heap to track only the highest scoring chunks during the scan. Add result aggregation by note and metadata extraction for citations.

Week 8: Prompt Builder and Token Budget Build the prompt builder that constructs the final prompt with role instruction, grounding instruction, retrieved note chunks as context and the user question. Implement token budget management — calculate available tokens after reserving space for system message, question, conversation history and LLM response.

Week 9: LLM Integration Integrate OpenRouter API for cloud generation with streaming response so answers appear token by token. Integrate Ollama for fully offline generation. Add the no notes found fallback that offers the user the option to answer from general knowledge.

Week 10: Connecting UI to Backend Connect all backend logic to the UI components. Citations become clickable with Go to note navigation. SourcesPanel populates from real search results. HistoryPanel saves and loads real conversations from SQLite. @ mention triggers real note search and focuses the conversation on the selected note. IndexStatus shows real indexing progress.

Week 11: Testing Write tests for core pipeline components — chunking, embedding, search and prompt builder. Fix bugs discovered during integration. Ensure mobile and desktop layouts work correctly across platforms.

Week 12: Polish and Documentation Final bug fixes and performance improvements. Write documentation covering setup, configuration and usage. Final review and submission.

5. Deliverables

A fully functional Joplin plugin implementing AI chat with notes using RAG
Core chat interface with numbered citations and Go to note navigation
Offline support via Ollama with fallback to OpenRouter for cloud generation
@ mention support for focusing the chat on specific notes
Sources panel showing all notes referenced in a conversation
Chat history saved and accessible across sessions
Background indexing with progress indicator on first run and new devices
No notes found fallback with option to answer from general knowledge
Works on both desktop and mobile
Unit tests for core pipeline components — chunking, embedding, search and prompt builder
Documentation covering setup, configuration and usage

6. Availability

I am available for approximately 30 to 35 hours per week during GSoC. I am based in Nigeria (WAT, UTC+1). During weekdays I can commit more time, around 8 to 10 hours per day when possible, and around 5 hours on weekends. I have university examinations that are expected to fall sometime in June or July. I do not have the exact dates yet but I will communicate with my mentors as soon as the schedule is confirmed so we can plan around it. Outside of the exam period I have no other commitments that would affect my availability.

AI Assistance Disclosure

AI was used to improve the grammar and structure of this proposal. All ideas, design decisions, technical reasoning, and the UI wireframe were developed by me.

shikuz · 25 March 2026 07:36

Hey @Beccaa, the UI spec is detailed - @mentions, sources panel, citation navigation, history.

The retrieval pipeline is brute-force cosine similarity. Have you looked at the scoping discussion? It covers where we see opportunities to go beyond what existing solutions do.

You mention mobile support from inception - how would sqlite-vec and Transformers.js work there?

Also, check your proposal against the submission template — the links section has empty fields (Idea Link, GitHub, Other relevant experience), and the testing strategy is missing from the Technical Approach.

Beccaa · 29 March 2026 02:56

Hi @shikuz, thanks for the feedback
Yes, I went through the scooping discussion. Building it as shared infrastructure makes more sense. If three plugins each load their own model that's three times the RAM. One shared index means one model loaded once and every feature benefits from it. I'm going to update my proposal to reflect that.

For the retrieval pipeline, I’ll be adding BM25 hybrid search alongside vector similarity so the exact term matches doesn’t get mixed and also reranking step after top-K to improve which results actually go to the LLM.

On mobile, I honestly didn’t think this part through enough. React Native environment doesn’t support sqlite-vec or WebAssemebly and Ollama works for just desktop. So the realistic approach is that desktop runs it locally via Tranformer.js and Ollama, while mobile falls back to Hugging face inference API for embedding and OpenROuter for generation with the user’s own API key.

I’ll make that clearer in the proposal update and also fix the empty links and add a proper testing strategy to the Technical Approach section.

Beccaa · 31 March 2026 16:23

GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Rebecca Ayodele(Updated)

This is an updated version of my proposal based on mentor feedback. Please refer to this version.

Link

Idea Link: gsoc/ideas.md at master · joplin/gsoc · GitHub
GitHub: RebeccaAyodele (Rebecca Ayodele) · GitHub
Forum introduction: https://discourse.joplinapp.org/t/welcome-to-gsoc-2026-with-joplin/48974/85
Pull requests: https://github.com/laurent22/joplin/pull/14883
Other relevant experience: https://github.com/RebeccaAyodele/AI-career-app

1. Introduction

Background / studies

I am a third-year Computer Science and Mathematics student at Obafemi Awolowo University, Nigeria.

Programming Experience

I have about 4 years of experience in frontend development working with React, TypeScript, JavaScript and Next.js. I worked at a startup for four months building and revamping web pages, integrating API endpoints and collaborating with backend developers and UI/UX designers. I also participated in the Lagos Impact Hackathon where my team placed 7th out of over 70 teams.
Recently I have been expanding into AI and machine learning, holding three certifications from the NVIDIA Deep Learning Institute:

Building RAG Agents with LLMs
Fundamentals of Deep Learning
Fundamentals of Accelerated Data Science.

I previously built a career AI application using Next.js, TypeScript and the OpenRouter API, integrating multiple LLMs including Nemotron, Mistral and Gemini with practical decisions around token limits and cost. I also built a RAG proof of concept called Recall using BGE-small embeddings and cosine similarity search which directly informed the technical decisions in this proposal.

2. Project Summary

What problem it solves

A user has hundreds of notes but cannot remember the exact title or keywords used to find them through regular search
A user wants to deeply understand a topic from their notebook
Related topics are spread across different notebooks and the user wants them all in one place to gain insights

Although a similar idea exists in the Jarvis plugin, it has key limitations. Jarvis is a third-party plugin that is not officially maintained by Joplin, meaning it can break when Joplin updates. It is also complex to set up and stores embeddings inside each note's hidden metadata which adds data to every individual note rather than keeping the index in a dedicated location. Idea 4 would be an official implementation with a cleaner architecture. One where the embedding index is designed to be shared across features like search, categorisation and chat, rather than each feature maintaining its own isolated index.

What will be implemented

1. Core chat Interface

Users access the chat through an AI button that appears when they tap the add icon. Clicking it opens a chat panel with an input field that accepts text. When a user asks a question, the AI retrieves relevant notes using RAG and generates an answer. If the answer draws from multiple notes, each part of the response is labelled with its source — for example Note 1, Note 2 — similar to how NotebookLM handles citations. Clicking a citation label shows a "Go to note" option that takes the user directly to that note, and pressing back returns them to the chat. A resources panel on the side of the chat lists all notes referenced in the conversation, giving users another way to navigate to their sources. The chat history is also saved so users can return to previous conversations.

2. Offline Support

On desktop the plugin works fully without internet by using a local LLM through Ollama, meaning the user's note data never leaves their device. This aligns with Joplin's core philosophy of being an offline-first application. On mobile, where local model execution is not feasible due to platform constraints, the plugin falls back to the Hugging Face Inference API for embeddings and OpenRouter for generation using the user's own API key. When Ollama is not installed on desktop, the plugin falls back gracefully to OpenRouter as well. When cloud APIs are used, only the relevant note chunks retrieved for that query are sent, not the user's entire note collection. API keys are stored locally in Joplin's settings and never transmitted anywhere other than the respective API provider.

Expected Outcome

By the end of the project, Joplin users will be able to open a dedicated AI chat panel and have natural conversations with their notes. Questions will return answers with numbered citations linking directly to the source notes, similar to how NotebookLM works. Users will be able to navigate to any referenced note and return to the chat seamlessly. A resources panel will list all notes referenced in the conversation. Chat history will be saved for future reference. The plugin will work fully offline using Ollama, keeping all note data on the user's device, with a fallback to cloud LLM via OpenRouter when needed.

Desktop UX screenshot

Mobile UX screenshoot

3. Technical Approach

Architecture and Components

Frontend (UI) components:

ChatPanel — the main chat interface. Users can type @ to mention a specific note and focus the conversation on that note only
MessageInput — text input with @ mention support
SourcesPanel — slide-in panel listing all notes referenced in the conversation
HistoryPanel — previous conversations with their associated sources
SettingsPage — API key configuration and index controls
IndexStatus — displays indexing progress on first run or new device

Backend (logic) components:

embeddings.ts — converts text to vectors using Transformers.js locally on desktop, or the Hugging Face Inference API on mobile where local model execution is not supported
indexer.ts — handles chunking notes and building and updating the index
search.ts — brute force cosine similarity search with top-K heap, combined with Joplin's keyword search, merged via RRF and reranked using ms-marco-MiniLM-L-6-v2
llm.ts — sends prompts to Ollama for fully local generation on desktop, OpenRouter as cloud fallback on desktop, and OpenRouter for generation on mobile with the user's own API key
database.ts — SQLite with sqlite-vec for storing embeddings, RAM cache management and chat history on desktop. On mobile, embeddings are generated on demand via the Hugging Face Inference API with no local index maintained.
promptBuilder.ts — constructs the prompt with token budget management

RAG Pipeline

Chunking

When a note is processed, it cannot be sent to the embedding model as a whole. Sending an entire note at once would dilute its meaning. The resulting vector would represent an average of everything in the note, making it harder to retrieve specific information accurately. Embedding models also have token limits, so long notes must be split into smaller pieces called chunks.

Notes are split at heading boundaries first, since content under the same heading is usually related. If a section has no headings, the plugin splits by paragraph and semantic breaks, points where the topic shifts. If a chunk is still too long after that, a sliding window with overlap is used to ensure context is preserved across boundaries. Each chunk is stored alongside its metadata: the note title, heading path, line number, and note ID. This metadata is what powers the citation system later.

Embedding

Once a note is chunked, each chunk is sent to the embedding model one by one or in parallel batches for speed. The model converts the text into a vector — a list of numbers that represents the meaning of that chunk. These vectors are what get stored and searched later.

The embedding model used is BGE-small-en-v1.5, running locally via Transformers.js. This model runs entirely in JavaScript with no Python or internet connection required, making it suitable for Joplin's offline-first philosophy. It is small enough to download once and fast enough to run on a typical user's device while still producing high quality embeddings for retrieval tasks.

To ensure model consistency, BGE-small is used for all embeddings — both for indexing notes and for converting the user's query at search time. On mobile, the same model is accessed via the Hugging Face Inference API. Since the same model is always used, the vectors always exist in the same space and comparisons are always meaningful. OpenRouter is only used for LLM generation, not for embeddings, so switching between online and offline modes never creates a model mismatch.

I chose bge-small because it produces 384-dimensional embeddings and is lightweight enough to run efficiently on CPU, making it suitable for background indexing without affecting user experience.

Storage

Embeddings are stored in a local SQLite database using the sqlite-vec extension. SQLite was chosen over other options because it is fast, simple, and well suited for local desktop applications. Storing embeddings in a JSON file would be too slow for large collections, and storing them in Joplin's note userData — as the Jarvis plugin does — adds hidden data to every note and increases complexity. SQLite keeps the index in a dedicated file that is separate from the notes themselves.

Since SQLite involves disk access, vectors are cached in memory for fast access after the first search. All subsequent searches happen entirely in memory, making them fast. Chat history and conversation records are also stored in SQLite alongside the embeddings.

Because SQLite does not sync automatically across devices, when a user opens Joplin on a new device the plugin detects there is no index and rebuilds it automatically in the background, showing a progress indicator so the user knows what is happening.

Search

When a user submits a query, two searches run in parallel. The first is a vector search: the query is converted into an embedding using bge-small-en-v1.5, quantized to Q8, and compared against cached note embeddings in RAM using cosine similarity. A brute-force scan is used instead of FAISS, as it is fast enough on CPU for typical Joplin collections, guarantees full recall, and avoids added complexity. Each result carries the note title, heading path, line number, and note ID, enabling the citation system.

In parallel, a keyword search via joplin.data.get('search', { query: userQuery }) captures exact terms and technical phrases. Both ranked lists are merged using Reciprocal Rank Fusion (RRF) with k=60, favoring results that appear highly in both lists. The merged results, up to 15, are reranked with ms-marco-MiniLM-L-6-v2, which scores each query–chunk pair for relevance. The highest-ranked chunks that fit within the available token budget are passed to the prompt builder, while all top results remain accessible, and multiple chunks from the same note are grouped to preserve context.

Prompt Construction and Token Budget

Before sending to the LLM, a token budget is calculated. The LLM has a maximum context window and can only process a certain number of tokens at once. Tokens are reserved for the system message, the user's question, conversation history from previous turns, and the LLM's response. Whatever tokens remain are used for the retrieved note chunks. If the chunks exceed the available budget, as many as fit are included and the rest are truncated.

The final prompt contains four things: a role instruction telling the LLM it is a note assistant, a grounding instruction telling it to only answer from the provided notes and acknowledge when information is not available, the retrieved note chunks as context, and the user's actual question. Grounding is important because it prevents the LLM from generating answers from its training data rather than the user's actual notes.

The token budget is divided as follows: approximately 200 tokens are reserved for the system prompt, 500 for conversation history, and 2000 for retrieved note chunks. Whatever remains is available for the LLM response. If retrieved chunks exceed the available budget, the lowest scoring ones are dropped first.

Generation and Citations

The prompt is sent to either OpenRouter for cloud generation or Ollama for local offline generation. The LLM generates the answer token by token so the user sees the response appearing gradually rather than waiting for the full answer. After the response, numbered citations show the user exactly which notes the answer came from. Clicking a citation shows a "Go to note" option that navigates directly to the source note.

When no relevant notes are found for a query, the plugin does not return an empty response. Instead it shows the user a message explaining that nothing was found in their notes and offers the option to answer from general knowledge instead. This gives the user control over whether they want the LLM to draw from its training data.

Integration with the Joplin Codebase

Since this is a plugin it does not modify Joplin's core codebase. Instead it integrates through the Joplin Plugin API:

joplin.data.get(['notes']) to read note content
joplin.views.panels to create and manage the chat panel, including create, setHtml, postMessage and onMessage for two-way communication between the plugin and the webview
joplin.commands.execute('openNote', noteId) to navigate to a note
joplin.workspace.onNoteChange to detect when notes are edited and trigger re-indexing
joplin.data.get('search', {query: userQuery}) to access Joplin's built-in full text search for the BM25 keyword search component

Libraries and Technologies

TypeScript — main language for the plugin, consistent with Joplin's codebase
React — building the chat UI components, which I have 4 years of experience with
Transformers.js (Hugging Face) — running BGE-small-en-v1.5 locally in JavaScript for offline embeddings. Chosen because it runs ONNX models entirely in JavaScript with no Python needed, which is critical for a Joplin plugin environment
SQLite with sqlite-vec — storing embeddings and chat history locally. Chosen over JSON for speed and over userData approach for simplicity and cleanliness
OpenRouter API — cloud LLM generation with access to multiple models through a single API key. I have production experience with this from my career AI app where I integrated Nemotron, Mistral and Gemini
Ollama — local LLM for fully offline generation, aligning with Joplin's offline-first philosophy
Joplin Plugin API — reading notes, creating panels, navigating to notes
ms-marco-MiniLM-L-6-v2 via Transformers.js — reranking retrieved chunks by how well they answer the user's query, running fully locally with no API key needed
Reciprocal Rank Fusion — merging vector and keyword search result lists into a single ranked output

Potential Challenges

Challenge 1 — First time indexing on a new device Users with large note collections may experience a slow initial indexing process when opening Joplin on a new device. The plugin handles this by running indexing in the background while showing a progress indicator, allowing the user to continue using Joplin normally.

Challenge 2 — Token budget management The LLM can only process a limited number of tokens at once. If the most relevant chunks exceed the available context window, some information will be truncated. The plugin manages this by calculating the available token budget after reserving space for the system message, user question, conversation history and LLM response, then fitting as many relevant chunks as possible within the remaining space.

Challenge 3 — Keeping the index fresh When users frequently edit notes, the index needs to stay up to date without impacting performance. The plugin handles this by using a content hash to detect which notes have actually changed, so only modified notes are re-embedded. Re-indexing is also debounced by 5 seconds after a note is saved, meaning multiple quick edits only trigger one re-embedding call.

Challenge 4 — Model consistency The embedding model used for notes and queries must always match, otherwise vectors exist in incompatible spaces. This is handled by using BGE-small via Transformers.js for all embeddings at all times — both when indexing notes and when processing queries. Since the model never changes, consistency is always guaranteed. OpenRouter is only used for LLM generation, not for embeddings.

Testing Strategy

Unit tests:

Chunking: Test against notes with headings, notes without headings, very long notes and empty notes. Pass condition: chunks always contain correct metadata; note title, heading path, line number and note ID.
RRF merger: Test with two mock result lists where some results overlap and some don't. Pass condition: overlapping results rank higher than non-overlapping ones and scores match the formula 1/(k + rank).
Prompt builder: test with chunk sets that exceed the token budget. Pass condition: total tokens never exceed the limit and lowest scoring chunks are dropped first.
Reranker: test with a fixed query and fixed result set. Pass condition: output ordering differs from input ordering, confirming the reranker is actively reordering results.

Integration tests:

Run the full pipeline on a small collection of 20 test notes with known content. Submit queries with known correct answers and verify the correct notes appear in citations.
Test the mobile fallback path — simulate unavailable local embedding and verify the plugin switches to the Hugging Face Inference API without crashing.

Performance tests:

Manually test indexing and query speed on collections of 100, 1000 and 5000 notes and record results.

4. Implementation Plan

Week 1: Familiarisation and Setup — Study the Joplin Plugin API documentation and existing plugin examples. Explore the codebase to understand how plugins interact with notes, panels and commands. Set up the plugin scaffold and development environment. Familiarise myself with Transformers.js and sqlite-vec by running small experiments locally.

Week 2: UI Components — Build all UI components — ChatPanel, MessageInput with @ mention, SourcesPanel, HistoryPanel, SettingsPage and IndexStatus — without backend functionality. UI will be static at this stage but fully designed and navigable on both desktop and mobile layouts.

Week 3: Chunking — Implement note reading using joplin.data.get(['notes']). Build the chunking system that splits notes at heading boundaries first, then by paragraph and semantic breaks, then by sliding window with overlap when needed. Store metadata alongside each chunk — note title, heading path, line number and note ID.

Week 4: Embedding — Integrate Transformers.js and BGE-small-en-v1.5 for generating embeddings from chunks on desktop. Integrate Hugging Face Inference API as the embedding path for mobile. Send chunks to the model one by one or in parallel batches. Verify that the same model is used consistently for both notes and queries on desktop.

Week 5: Storage — Set up SQLite with sqlite-vec for storing embeddings and metadata on desktop. Implement the RAM cache that loads all vectors into memory on first search. Store chat history and conversation records in SQLite alongside embeddings. On mobile, embeddings are generated on demand via the Hugging Face Inference API with no local index maintained.

Week 6: Indexing Pipeline — Add content hash detection so only changed notes are re-embedded. Implement debounced re-indexing triggered 5 seconds after a note is saved. Add background indexing on first run or new device with a progress indicator.

Week 7: Search — Implement Q8 quantization for vectors. Build brute force cosine similarity search against the RAM cache. Implement top-K heap to track only the highest scoring chunks during the scan. Add keyword search using Joplin's built-in full text search via joplin.data.get('search', {query: userQuery}). Implement Reciprocal Rank Fusion to merge vector and keyword results into one ranked list. Add result aggregation by note and metadata extraction for citations.

Week 8: Reranker and Prompt Builder — Integrate ms-marco-MiniLM-L-6-v2 via Transformers.js for reranking the merged results. Build the prompt builder that constructs the final prompt with role instruction, grounding instruction, retrieved note chunks as context and the user question. Implement token budget management — approximately 200 tokens for system prompt, 500 for conversation history, and 2000 for retrieved chunks, with lowest scoring chunks dropped first when the budget is exceeded.

Week 9: LLM Integration — Integrate OpenRouter API for cloud generation with streaming response so answers appear token by token. Integrate Ollama for fully offline generation on desktop. Add the no notes found fallback that offers the user the option to answer from general knowledge.

Week 10: Connecting UI to Backend — Connect all backend logic to the UI components. Citations become clickable with Go to note navigation. SourcesPanel populates from real search results. HistoryPanel saves and loads real conversations from SQLite. @ mention triggers real note search and focuses the conversation on the selected note. IndexStatus shows real indexing progress.

Week 11: Testing — Write unit tests for chunking, RRF merger, prompt builder and reranker. Write integration tests for the full pipeline on a small note collection. Test mobile fallback path. Test performance on collections of 100, 1000 and 5000 notes. Fix bugs discovered during testing.

Week 12: Polish and Documentation — Final bug fixes and performance improvements. Write documentation covering setup, configuration and usage. Final review and submission.

5. Deliverables

A fully functional Joplin plugin implementing AI chat with notes using RAG
Core chat interface with numbered citations and Go to note navigation
Offline support via Ollama with fallback to OpenRouter for cloud generation
@ mention support for focusing the chat on specific notes
Sources panel showing all notes referenced in a conversation
Chat history saved and accessible across sessions
Background indexing with progress indicator on first run and new devices
No notes found fallback with option to answer from general knowledge
Works on both desktop and mobile
Unit tests for core pipeline components — chunking, embedding, search and prompt builder
Documentation covering setup, configuration and usage

6. Availability

I am available for approximately 30 to 35 hours per week during GSoC. I am based in Nigeria (WAT, UTC+1). During weekdays I can commit more time, around 8 to 10 hours per day when possible, and around 5 hours on weekends. I have university examinations that are expected to fall sometime in June or July. I do not have the exact dates yet but I will communicate with my mentors as soon as the schedule is confirmed so we can plan around it. Outside of the exam period I have no other commitments that would affect my availability.

AI Assistance Disclosure

AI was used to improve the grammar and structure of this proposal. All ideas, design decisions, technical reasoning, and the UI wireframe were developed by me.

Topic		Replies	Views
GSoC 2026: Opportunities for the AI projects GSoC	32	697	13 April 2026
GSoC Idea Discussion: Chat with your note collection using AI – architecture and LLM approach Development	5	146	13 March 2026
GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI GSoC	0	19	31 March 2026
Plugin: GPT Plugins	6	3342	15 May 2023
LangChain / LlamaIndex Joplin integrations for developing AI apps Apps	3	1401	7 May 2025

GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Rebecca Ayodele

GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Rebecca Ayodele

Links

1. Introduction

2. Project Summary

What problem it solves

What will be implemented

Expected Outcome

**

**

Chunking

Embedding

Storage

Search

Prompt Construction and Token Budget

Generation and Citations

Architecture and Components

Frontend (UI) components:

Backend (logic) components:

Integration with the Joplin Codebase

Libraries and Technologies

Potential Challenges

4. Implementation Plan

5. Deliverables

6. Availability

GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Rebecca Ayodele(Updated)

Link

1. Introduction

Background / studies

Programming Experience

2. Project Summary

What problem it solves

What will be implemented

1. Core chat Interface

2. Offline Support

Expected Outcome

3. Technical Approach

Architecture and Components

Frontend (UI) components:

Backend (logic) components:

RAG Pipeline

Chunking

Embedding

Storage

Search

Prompt Construction and Token Budget

Generation and Citations

Integration with the Joplin Codebase

Libraries and Technologies

Potential Challenges

Testing Strategy

Unit tests:

Integration tests:

Performance tests:

4. Implementation Plan

5. Deliverables

6. Availability

Related topics