GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Jalina_Hirushan

Hi everyone! I am preparing a proposal for the "Chat with your note collection using AI" project and would love to get some early feedback from the mentors and community on my architectural approach.

AI Assistance Disclosure

This proposal's core architecture, implementation logic, and timeline were developed manually through researching the Joplin codebase, the plugin API, and forum discussions. AI tools were utilized strictly to improve the clarity, grammar, and structural formatting of the text. I have carefully reviewed all text, verified its correctness, and confirm I fully understand the proposed architecture and implementation.

Links

  • Project Idea: Chat with your note collection using AI
  • GitHub Profile: JalinaH
  • Forum Introduction: Introducing Jalina Hirushan
  • Pull Requests:
    • PR #14667 (Merged)
    • PR #14481 (Closed)
    • PR #14473 (Closed)
    • PR #14460 (Closed)
    • PR #14459 (Closed)
    • PR #14425 (Closed)

1. Introduction

  • Background / studies: I am a third-year IT undergraduate at the University of Moratuwa, Sri Lanka. I previously engineered a production-like RAG chatbot utilizing Next.js, TypeScript, and Supabase's vector database (pgvector) to vectorize and query notes. This system gave me a profound understanding of vector mathematics, context window management, and chunk overlap strategies. The core engineering challenge I am tackling for this GSoC project is an architectural transposition—moving these established cloud-native RAG mechanics into Joplin's localized, offline Electron sandbox.
  • Programming experience: I possess strong proficiency across the MERN stack and deep expertise in React/Next.js, TypeScript/JavaScript, and Node.js. My specialized experience includes developing full-stack AI applications involving Retrieval-Augmented Generation (RAG), working with Vector Databases, utilizing WebAssembly (WASM) to bypass Joplin's strict restrictions against bundling native Node.js (C++) modules , and managing asynchronous, CPU-bound tasks in Web Workers.
  • Experience with open source: I am an active contributor to Joplin. Through my past pull requests, I have familiarized myself with the monorepo structure, React/Redux components, and the core contribution guidelines.

2. Project Summary

  • What problem it solves: As users accumulate vast markdown knowledge bases, traditional keyword search fails to uncover nuanced relationships. Users lack a way to interrogate their own curated data contextually.
  • What will be implemented: A local-first, privacy-preserving Retrieval-Augmented Generation (RAG) assistant built entirely as a Joplin plugin.
  • Expected outcome: A conversational React-based chat panel within the application where users can ask natural language questions. The AI will generate answers strictly grounded in the user's local notes and provide clickable citations back to the original source notes.

3. Technical Approach

  • Architecture or components involved: The pipeline involves note ingestion via the Joplin Data API, structural markdown chunking, vector embedding, semantic retrieval, and LLM prompt building.**
  • Changes to the Joplin codebase: The project will be developed entirely as a plugin using Joplin's Plugin API to maintain modularity and keep heavy AI dependencies out of the core application.
  • Libraries or technologies you plan to use: To bypass Joplin's strict native Node.js module restrictions, the architecture relies heavily on WebAssembly (WASM). I will use sqlite-vec for local, entirely offline vector storage, as it integrates perfectly with Joplin's existing SQLite environment. Embedding generation will utilize lightweight quantized models (like BGE-small-en-v1.5) via Transformers.js. LLM inference will default to local providers like Ollama, with opt-in cloud API fallbacks.
  • Potential challenges: Running CPU-bound tasks like Transformers.js matrix multiplication on the main Electron thread will cause the Joplin UI to freeze. I will mitigate this by delegating embedding generation to a background Web Worker via asynchronous messaging.

4. Implementation Plan

(Based on the 350-hour Large project requirement mapped over 12 weeks)

  • Week 1–2: Infrastructure & Database Scaffold the plugin architecture and establish the UI settings page (API key inputs, model selection). Initialize the sqlite-vec database schema and validate WASM loading within the plugin environment.
  • Week 3–4: Ingestion & Chunking Implement the Note Ingestion engine using the Joplin Data API in paginated batches. Develop the structural Markdown chunking algorithm (splitting by headers and using a sliding token window with overlap) and hash notes via SHA-256 for differential syncing.**
  • Week 5–6: Embedding Pipeline & Vectorization Integrate Transformers.js and load the local ONNX model. Implement Web Worker threading mechanisms to ensure the background embedding process does not block the main application thread. Implement cosine similarity search within sqlite-vec.
  • Week 7–8: Prompt Building & LLM Integration Develop the Prompt Builder to enforce strict token budgets and wrap context in XML tags. Integrate the LLM Provider Interface, supporting local endpoints (Ollama) and cloud APIs (OpenAI/Gemini) with robust error handling for timeouts and connection refusals.
  • Week 9–10: React Chat UI & IPC Integration Develop the React-based sidebar chat panel. Implement Inter-Process Communication (IPC) to pass user queries to the backend and stream tokens back to the UI. Implement parsing logic to convert LLM citation markers into clickable deep links utilizing joplin.commands.execute('openNote').**
  • Week 11–12: Testing, Polish & Delivery Handle edge cases (e.g., empty databases, missing native dependencies, model download failures). Write comprehensive unit tests for chunking and IPC logic. Finalize user and developer documentation and prepare the final GSoC evaluation report.

5. Deliverables

  • Implemented features: A complete, local-first RAG plugin featuring a background synchronization engine, sqlite-vec semantic search, and an interactive React-based chat sidebar with source citations.
  • Tests: Unit test suites for the structural chunking algorithm, database schema integrity, and IPC messaging bridges.
  • Documentation: A comprehensive user guide for configuring local LLM models (e.g., Ollama) alongside developer documentation detailing the Web Worker architecture.

6. Availability

  • Weekly availability during GSoC: I can commit ~30 hours per week to comfortably meet the 350-hour requirement for this Large-sized project.

  • Time zone: Asia/Colombo (+5.30 UTC)

  • Other commitments: I am currently participating in an internship that concludes in June. I have carefully evaluated my schedule and mapped out my availability to ensure I comfortably meet the 350-hour requirement. My planned schedule is:

  • During Internship (May - June): ~20 hours per week.

  • Post-Internship (June - August): ~35 hours per week. If mentors prefer a slightly less intensive pacing during the overlap, I am completely open to utilizing GSoC's flexible timeline rules to extend the coding period up to 14 or 16 weeks to ensure the highest possible code quality without burnout. I want to be completely upfront about my schedule so we can plan the summer effectively

Hello, if you haven't already, consider providing your input on this thread as it could be relevant to your project:

Thanks for sharing, I’ll take a look.

Hey @JalinaH, have you tested sqlite-vec's WASM build inside a Joplin plugin? The AI summarisation plugin by @HahaBill is worth looking at for how ONNX WASM loading was handled there.

What does "strict token budgets" mean in practice for a long conversation - how do you manage the context window as history grows?

1 Like

Hi, thanks for taking the time to review my draft and for the excellent questions!

Regarding sqlite-vec and WASM: I actually just built a Proof of Concept plugin today to test the sqlite-vec WASM compilation and hit the exact Electron environment boundaries you are hinting at!

When I initially attempted to instantiate sqlite-vec-wasm directly in the plugin's background script (index.ts), Emscripten immediately threw a fatal not compiled for this environment error. This happens because the background script runs in a Node.js context, but the WASM binary is explicitly compiled for Browser/Web Worker environments so it can leverage OPFS (Origin Private File System) for persistent storage.

This validates my architectural decision: the sqlite-vec database and Transformers.js cannot live in the Node background script. They must be instantiated inside a hidden Web Worker spawned from the React Panel (the WebView context).

To handle the Webpack bundling issues that @HahaBill encountered with ONNX , my Webpack configuration uses CopyWebpackPlugin to push the raw .wasm binaries directly into the dist folder. The WebView worker then fetches them asynchronously at runtime, completely bypassing Webpack's node-loader trap.

Regarding "strict token budgets": As a multi-turn conversation grows, the chat history can easily consume the LLM's entire context window. If that happens, it either pushes out the retrieved RAG notes or causes the model to suffer from "lost-in-the-middle" hallucinations.

A strict token budget solves this by assigning hard boundaries to the prompt structure. For example, if we use a model with a 4K context window, the Prompt Builder enforces a split like this:

  • System Instructions: ~300 tokens
  • Retrieved Notes (The RAG Context): ~1500 tokens (Hard maximum)
  • Conversation History: ~1000 tokens
  • Reserved for Generation: ~1200 tokens

In practice, this means the retrieved notes always have guaranteed space. As the chat history grows beyond its 1000-token allocation, the system will apply a compaction strategy, either utilizing a sliding window that drops the oldest turns, or running a quick background summary of the past conversation to compress it. This ensures the LLM always prioritizes the freshly retrieved notes for its current answer.

Let me know if you have any other questions about the architecture!

Thanks @jalina, the PoC was useful for finding the WASM boundary.

If a user asks a question that combines a specific term / keyword from one note with a broader topic from another, how does cosine similarity handle that?

1 Like

Hi again, thanks for the great follow-up question!

To be completely transparent, pure cosine similarity actually struggles with this exact scenario. While dense vectors are fantastic at understanding broad semantic topics, they often miss specific keywords, exact names, or acronyms because the vector represents the average overall meaning of the text, which dilutes individual words.

To solve this and handle queries that combine specific terms with broad topics, the architecture must implement a Hybrid Search strategy.

Here is how the system handles it:

1. Parallel Retrieval: Instead of relying solely on vector search, the system runs two retrieval processes in parallel:

  • Dense Vector Search (sqlite-vec + Cosine Similarity): Retrieves the chunks that match the broader semantic topic.
  • Sparse Keyword Search (BM25): Retrieves the chunks containing the exact specific term/keyword. Since Joplin's core search engine already utilizes BM25 for ranking, leveraging this algorithm aligns perfectly with the existing ecosystem.

2. Reciprocal Rank Fusion (RRF): Once both searches return their top candidate chunks, we merge the two lists using Reciprocal Rank Fusion (RRF). RRF is a highly effective algorithm that evaluates the positions of items across multiple ranked lists to produce a single, unified result set. It ensures that chunks ranking highly in either the keyword search or the semantic search are pushed to the top of the final list.

3. Multi-Hop Synthesis via the LLM: Because RRF successfully surfaces the best chunks from both the keyword match (e.g., Note A) and the semantic match (e.g., Note B), the Prompt Builder injects both distinct chunks into the LLM's context window. The LLM is then responsible for the "multi-hop" reasoning, synthesizing the specific term from Note A with the broader context from Note B to generate a unified, grounded answer for the user.

Let me know if you would like me to elaborate on how the RRF scoring would be implemented alongside sqlite-vec!

1 Like