GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Rebecca Ayodele

GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Rebecca Ayodele(Updated)

This is an updated version of my proposal based on mentor feedback. Please refer to this version.

Link

1. Introduction

Background / studies

I am a third-year Computer Science and Mathematics student at Obafemi Awolowo University, Nigeria.

Programming Experience

I have about 4 years of experience in frontend development working with React, TypeScript, JavaScript and Next.js. I worked at a startup for four months building and revamping web pages, integrating API endpoints and collaborating with backend developers and UI/UX designers. I also participated in the Lagos Impact Hackathon where my team placed 7th out of over 70 teams.
Recently I have been expanding into AI and machine learning, holding three certifications from the NVIDIA Deep Learning Institute:

  • Building RAG Agents with LLMs
  • Fundamentals of Deep Learning
  • Fundamentals of Accelerated Data Science.

I previously built a career AI application using Next.js, TypeScript and the OpenRouter API, integrating multiple LLMs including Nemotron, Mistral and Gemini with practical decisions around token limits and cost. I also built a RAG proof of concept called Recall using BGE-small embeddings and cosine similarity search which directly informed the technical decisions in this proposal.

2. Project Summary

What problem it solves

  • A user has hundreds of notes but cannot remember the exact title or keywords used to find them through regular search

  • A user wants to deeply understand a topic from their notebook

  • Related topics are spread across different notebooks and the user wants them all in one place to gain insights

Although a similar idea exists in the Jarvis plugin, it has key limitations. Jarvis is a third-party plugin that is not officially maintained by Joplin, meaning it can break when Joplin updates. It is also complex to set up and stores embeddings inside each note's hidden metadata which adds data to every individual note rather than keeping the index in a dedicated location. Idea 4 would be an official implementation with a cleaner architecture. One where the embedding index is designed to be shared across features like search, categorisation and chat, rather than each feature maintaining its own isolated index.

What will be implemented

1. Core chat Interface

Users access the chat through an AI button that appears when they tap the add icon. Clicking it opens a chat panel with an input field that accepts text. When a user asks a question, the AI retrieves relevant notes using RAG and generates an answer. If the answer draws from multiple notes, each part of the response is labelled with its source — for example Note 1, Note 2 — similar to how NotebookLM handles citations. Clicking a citation label shows a "Go to note" option that takes the user directly to that note, and pressing back returns them to the chat. A resources panel on the side of the chat lists all notes referenced in the conversation, giving users another way to navigate to their sources. The chat history is also saved so users can return to previous conversations.

2. Offline Support

On desktop the plugin works fully without internet by using a local LLM through Ollama, meaning the user's note data never leaves their device. This aligns with Joplin's core philosophy of being an offline-first application. On mobile, where local model execution is not feasible due to platform constraints, the plugin falls back to the Hugging Face Inference API for embeddings and OpenRouter for generation using the user's own API key. When Ollama is not installed on desktop, the plugin falls back gracefully to OpenRouter as well. When cloud APIs are used, only the relevant note chunks retrieved for that query are sent, not the user's entire note collection. API keys are stored locally in Joplin's settings and never transmitted anywhere other than the respective API provider.

Expected Outcome

By the end of the project, Joplin users will be able to open a dedicated AI chat panel and have natural conversations with their notes. Questions will return answers with numbered citations linking directly to the source notes, similar to how NotebookLM works. Users will be able to navigate to any referenced note and return to the chat seamlessly. A resources panel will list all notes referenced in the conversation. Chat history will be saved for future reference. The plugin will work fully offline using Ollama, keeping all note data on the user's device, with a fallback to cloud LLM via OpenRouter when needed.

Desktop UX screenshot

Desktop mockup

Mobile UX screenshoot

3. Technical Approach

Architecture and Components

Frontend (UI) components:

  • ChatPanel — the main chat interface. Users can type @ to mention a specific note and focus the conversation on that note only

  • MessageInput — text input with @ mention support

  • SourcesPanel — slide-in panel listing all notes referenced in the conversation

  • HistoryPanel — previous conversations with their associated sources

  • SettingsPage — API key configuration and index controls

  • IndexStatus — displays indexing progress on first run or new device

Backend (logic) components:

  • embeddings.ts — converts text to vectors using Transformers.js locally on desktop, or the Hugging Face Inference API on mobile where local model execution is not supported

  • indexer.ts — handles chunking notes and building and updating the index

  • search.ts — brute force cosine similarity search with top-K heap, combined with Joplin's keyword search, merged via RRF and reranked using ms-marco-MiniLM-L-6-v2

  • llm.ts — sends prompts to Ollama for fully local generation on desktop, OpenRouter as cloud fallback on desktop, and OpenRouter for generation on mobile with the user's own API key

  • database.ts — SQLite with sqlite-vec for storing embeddings, RAM cache management and chat history on desktop. On mobile, embeddings are generated on demand via the Hugging Face Inference API with no local index maintained.

  • promptBuilder.ts — constructs the prompt with token budget management

RAG Pipeline

Chunking

When a note is processed, it cannot be sent to the embedding model as a whole. Sending an entire note at once would dilute its meaning. The resulting vector would represent an average of everything in the note, making it harder to retrieve specific information accurately. Embedding models also have token limits, so long notes must be split into smaller pieces called chunks.

Notes are split at heading boundaries first, since content under the same heading is usually related. If a section has no headings, the plugin splits by paragraph and semantic breaks, points where the topic shifts. If a chunk is still too long after that, a sliding window with overlap is used to ensure context is preserved across boundaries. Each chunk is stored alongside its metadata: the note title, heading path, line number, and note ID. This metadata is what powers the citation system later.

Embedding

Once a note is chunked, each chunk is sent to the embedding model one by one or in parallel batches for speed. The model converts the text into a vector — a list of numbers that represents the meaning of that chunk. These vectors are what get stored and searched later.

The embedding model used is BGE-small-en-v1.5, running locally via Transformers.js. This model runs entirely in JavaScript with no Python or internet connection required, making it suitable for Joplin's offline-first philosophy. It is small enough to download once and fast enough to run on a typical user's device while still producing high quality embeddings for retrieval tasks.

To ensure model consistency, BGE-small is used for all embeddings — both for indexing notes and for converting the user's query at search time. On mobile, the same model is accessed via the Hugging Face Inference API. Since the same model is always used, the vectors always exist in the same space and comparisons are always meaningful. OpenRouter is only used for LLM generation, not for embeddings, so switching between online and offline modes never creates a model mismatch.

I chose bge-small because it produces 384-dimensional embeddings and is lightweight enough to run efficiently on CPU, making it suitable for background indexing without affecting user experience.

Storage

Embeddings are stored in a local SQLite database using the sqlite-vec extension. SQLite was chosen over other options because it is fast, simple, and well suited for local desktop applications. Storing embeddings in a JSON file would be too slow for large collections, and storing them in Joplin's note userData — as the Jarvis plugin does — adds hidden data to every note and increases complexity. SQLite keeps the index in a dedicated file that is separate from the notes themselves.

Since SQLite involves disk access, vectors are cached in memory for fast access after the first search. All subsequent searches happen entirely in memory, making them fast. Chat history and conversation records are also stored in SQLite alongside the embeddings.

Because SQLite does not sync automatically across devices, when a user opens Joplin on a new device the plugin detects there is no index and rebuilds it automatically in the background, showing a progress indicator so the user knows what is happening.

Search

When a user submits a query, two searches run in parallel. The first is a vector search: the query is converted into an embedding using bge-small-en-v1.5, quantized to Q8, and compared against cached note embeddings in RAM using cosine similarity. A brute-force scan is used instead of FAISS, as it is fast enough on CPU for typical Joplin collections, guarantees full recall, and avoids added complexity. Each result carries the note title, heading path, line number, and note ID, enabling the citation system.

In parallel, a keyword search via joplin.data.get('search', { query: userQuery }) captures exact terms and technical phrases. Both ranked lists are merged using Reciprocal Rank Fusion (RRF) with k=60, favoring results that appear highly in both lists. The merged results, up to 15, are reranked with ms-marco-MiniLM-L-6-v2, which scores each query–chunk pair for relevance. The highest-ranked chunks that fit within the available token budget are passed to the prompt builder, while all top results remain accessible, and multiple chunks from the same note are grouped to preserve context.

Prompt Construction and Token Budget

Before sending to the LLM, a token budget is calculated. The LLM has a maximum context window and can only process a certain number of tokens at once. Tokens are reserved for the system message, the user's question, conversation history from previous turns, and the LLM's response. Whatever tokens remain are used for the retrieved note chunks. If the chunks exceed the available budget, as many as fit are included and the rest are truncated.

The final prompt contains four things: a role instruction telling the LLM it is a note assistant, a grounding instruction telling it to only answer from the provided notes and acknowledge when information is not available, the retrieved note chunks as context, and the user's actual question. Grounding is important because it prevents the LLM from generating answers from its training data rather than the user's actual notes.

The token budget is divided as follows: approximately 200 tokens are reserved for the system prompt, 500 for conversation history, and 2000 for retrieved note chunks. Whatever remains is available for the LLM response. If retrieved chunks exceed the available budget, the lowest scoring ones are dropped first.

Generation and Citations

The prompt is sent to either OpenRouter for cloud generation or Ollama for local offline generation. The LLM generates the answer token by token so the user sees the response appearing gradually rather than waiting for the full answer. After the response, numbered citations show the user exactly which notes the answer came from. Clicking a citation shows a "Go to note" option that navigates directly to the source note.

When no relevant notes are found for a query, the plugin does not return an empty response. Instead it shows the user a message explaining that nothing was found in their notes and offers the option to answer from general knowledge instead. This gives the user control over whether they want the LLM to draw from its training data.

Integration with the Joplin Codebase

Since this is a plugin it does not modify Joplin's core codebase. Instead it integrates through the Joplin Plugin API:

  • joplin.data.get(['notes']) to read note content

  • joplin.views.panels to create and manage the chat panel, including create, setHtml, postMessage and onMessage for two-way communication between the plugin and the webview

  • joplin.commands.execute('openNote', noteId) to navigate to a note

  • joplin.workspace.onNoteChange to detect when notes are edited and trigger re-indexing

  • joplin.data.get('search', {query: userQuery}) to access Joplin's built-in full text search for the BM25 keyword search component

Libraries and Technologies

  • TypeScript — main language for the plugin, consistent with Joplin's codebase

  • React — building the chat UI components, which I have 4 years of experience with

  • Transformers.js (Hugging Face) — running BGE-small-en-v1.5 locally in JavaScript for offline embeddings. Chosen because it runs ONNX models entirely in JavaScript with no Python needed, which is critical for a Joplin plugin environment

  • SQLite with sqlite-vec — storing embeddings and chat history locally. Chosen over JSON for speed and over userData approach for simplicity and cleanliness

  • OpenRouter API — cloud LLM generation with access to multiple models through a single API key. I have production experience with this from my career AI app where I integrated Nemotron, Mistral and Gemini

  • Ollama — local LLM for fully offline generation, aligning with Joplin's offline-first philosophy

  • Joplin Plugin API — reading notes, creating panels, navigating to notes

  • ms-marco-MiniLM-L-6-v2 via Transformers.js — reranking retrieved chunks by how well they answer the user's query, running fully locally with no API key needed

  • Reciprocal Rank Fusion — merging vector and keyword search result lists into a single ranked output

Potential Challenges

Challenge 1 — First time indexing on a new device Users with large note collections may experience a slow initial indexing process when opening Joplin on a new device. The plugin handles this by running indexing in the background while showing a progress indicator, allowing the user to continue using Joplin normally.

Challenge 2 — Token budget management The LLM can only process a limited number of tokens at once. If the most relevant chunks exceed the available context window, some information will be truncated. The plugin manages this by calculating the available token budget after reserving space for the system message, user question, conversation history and LLM response, then fitting as many relevant chunks as possible within the remaining space.

Challenge 3 — Keeping the index fresh When users frequently edit notes, the index needs to stay up to date without impacting performance. The plugin handles this by using a content hash to detect which notes have actually changed, so only modified notes are re-embedded. Re-indexing is also debounced by 5 seconds after a note is saved, meaning multiple quick edits only trigger one re-embedding call.

Challenge 4 — Model consistency The embedding model used for notes and queries must always match, otherwise vectors exist in incompatible spaces. This is handled by using BGE-small via Transformers.js for all embeddings at all times — both when indexing notes and when processing queries. Since the model never changes, consistency is always guaranteed. OpenRouter is only used for LLM generation, not for embeddings.

Testing Strategy

Unit tests:

  • Chunking: Test against notes with headings, notes without headings, very long notes and empty notes. Pass condition: chunks always contain correct metadata; note title, heading path, line number and note ID.
  • RRF merger: Test with two mock result lists where some results overlap and some don't. Pass condition: overlapping results rank higher than non-overlapping ones and scores match the formula 1/(k + rank).
  • Prompt builder: test with chunk sets that exceed the token budget. Pass condition: total tokens never exceed the limit and lowest scoring chunks are dropped first.
  • Reranker: test with a fixed query and fixed result set. Pass condition: output ordering differs from input ordering, confirming the reranker is actively reordering results.

Integration tests:

  • Run the full pipeline on a small collection of 20 test notes with known content. Submit queries with known correct answers and verify the correct notes appear in citations.
  • Test the mobile fallback path — simulate unavailable local embedding and verify the plugin switches to the Hugging Face Inference API without crashing.

Performance tests:

  • Manually test indexing and query speed on collections of 100, 1000 and 5000 notes and record results.

4. Implementation Plan

Week 1: Familiarisation and Setup — Study the Joplin Plugin API documentation and existing plugin examples. Explore the codebase to understand how plugins interact with notes, panels and commands. Set up the plugin scaffold and development environment. Familiarise myself with Transformers.js and sqlite-vec by running small experiments locally.

Week 2: UI Components — Build all UI components — ChatPanel, MessageInput with @ mention, SourcesPanel, HistoryPanel, SettingsPage and IndexStatus — without backend functionality. UI will be static at this stage but fully designed and navigable on both desktop and mobile layouts.

Week 3: Chunking — Implement note reading using joplin.data.get(['notes']). Build the chunking system that splits notes at heading boundaries first, then by paragraph and semantic breaks, then by sliding window with overlap when needed. Store metadata alongside each chunk — note title, heading path, line number and note ID.

Week 4: Embedding — Integrate Transformers.js and BGE-small-en-v1.5 for generating embeddings from chunks on desktop. Integrate Hugging Face Inference API as the embedding path for mobile. Send chunks to the model one by one or in parallel batches. Verify that the same model is used consistently for both notes and queries on desktop.

Week 5: Storage — Set up SQLite with sqlite-vec for storing embeddings and metadata on desktop. Implement the RAM cache that loads all vectors into memory on first search. Store chat history and conversation records in SQLite alongside embeddings. On mobile, embeddings are generated on demand via the Hugging Face Inference API with no local index maintained.

Week 6: Indexing Pipeline — Add content hash detection so only changed notes are re-embedded. Implement debounced re-indexing triggered 5 seconds after a note is saved. Add background indexing on first run or new device with a progress indicator.

Week 7: Search — Implement Q8 quantization for vectors. Build brute force cosine similarity search against the RAM cache. Implement top-K heap to track only the highest scoring chunks during the scan. Add keyword search using Joplin's built-in full text search via joplin.data.get('search', {query: userQuery}). Implement Reciprocal Rank Fusion to merge vector and keyword results into one ranked list. Add result aggregation by note and metadata extraction for citations.

Week 8: Reranker and Prompt Builder — Integrate ms-marco-MiniLM-L-6-v2 via Transformers.js for reranking the merged results. Build the prompt builder that constructs the final prompt with role instruction, grounding instruction, retrieved note chunks as context and the user question. Implement token budget management — approximately 200 tokens for system prompt, 500 for conversation history, and 2000 for retrieved chunks, with lowest scoring chunks dropped first when the budget is exceeded.

Week 9: LLM Integration — Integrate OpenRouter API for cloud generation with streaming response so answers appear token by token. Integrate Ollama for fully offline generation on desktop. Add the no notes found fallback that offers the user the option to answer from general knowledge.

Week 10: Connecting UI to Backend — Connect all backend logic to the UI components. Citations become clickable with Go to note navigation. SourcesPanel populates from real search results. HistoryPanel saves and loads real conversations from SQLite. @ mention triggers real note search and focuses the conversation on the selected note. IndexStatus shows real indexing progress.

Week 11: Testing — Write unit tests for chunking, RRF merger, prompt builder and reranker. Write integration tests for the full pipeline on a small note collection. Test mobile fallback path. Test performance on collections of 100, 1000 and 5000 notes. Fix bugs discovered during testing.

Week 12: Polish and Documentation — Final bug fixes and performance improvements. Write documentation covering setup, configuration and usage. Final review and submission.

5. Deliverables

  • A fully functional Joplin plugin implementing AI chat with notes using RAG

  • Core chat interface with numbered citations and Go to note navigation

  • Offline support via Ollama with fallback to OpenRouter for cloud generation

  • @ mention support for focusing the chat on specific notes

  • Sources panel showing all notes referenced in a conversation

  • Chat history saved and accessible across sessions

  • Background indexing with progress indicator on first run and new devices

  • No notes found fallback with option to answer from general knowledge

  • Works on both desktop and mobile

  • Unit tests for core pipeline components — chunking, embedding, search and prompt builder

  • Documentation covering setup, configuration and usage

6. Availability

I am available for approximately 30 to 35 hours per week during GSoC. I am based in Nigeria (WAT, UTC+1). During weekdays I can commit more time, around 8 to 10 hours per day when possible, and around 5 hours on weekends. I have university examinations that are expected to fall sometime in June or July. I do not have the exact dates yet but I will communicate with my mentors as soon as the schedule is confirmed so we can plan around it. Outside of the exam period I have no other commitments that would affect my availability.

AI Assistance Disclosure

AI was used to improve the grammar and structure of this proposal. All ideas, design decisions, technical reasoning, and the UI wireframe were developed by me.