GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Harsh16gupta

Hello everyone
Link to the project idea :- Chat with your note collection using AI
GitHub profile :- Harsh16gupta
Forum introduction post :- Introducing Harsh16gupta
Pull requests:

PR Description Status
#14591 Auto-scroll to selected note from 'Go to Anything' search results Merged
#14503 Add new option to disable the Joplin icon for internal note links Merged
#14474 Copying from markdown preview including theme background colour Merged
#14529 Translate Find and Replace dialog in Rich Text editor Merged
#14423 Prevent 4th backtick when closing fenced code block Merged
#14410 Added video tutorials to documentation pages Merged
#14561 Added the pdf viewer for the Rich text editor Open
#14767 ABC Sheet Music rendering out of bound Merged
#14749 Fixed Custom Dictionary.txt being saved to wrong directory Open

1. Introduction

Hi myself Harsh Gupta a third-year B-Tech student at Harcourt Butler Technical University, Kanpur. I was introduced to programming in high school, and since then I have really enjoyed solving problems through code and building useful software. While working on personal and collaborative projects has been a great learning experience, open source has given me the opportunity to contribute to large, real world codebases used by many people.

1.1 Past Experience in Software Development

GRS Worker (Freelanced Project)
Designed and developed a full website in a 2-member team, handling system architecture, responsive UI, and backend integration. (Live | GitHub)

Chess.in (Real-Time Online Chess Platform)
Built a real-time online chess platform with WebSocket-based live gameplay, synchronized state updates, and in-game chat. (GitHub)

See2Say (AI Vision-to-Speech Platform)
Developed an AI pipeline that converts video frames into narrated audio using OpenCV, BLIP captioning, Gemini summarization, and gTTS. (GitHub)

1.2 Open-Source Experience

My first open-source contribution was to the AsyncAPI Initiative, where I worked mainly on the AsyncAPI Generator project for about one and a half months, merging 18 pull requests.

Some of my key contribution include:

  • Refactored Python WebSocket helpers from asynchronous to synchronous execution while preserving behavior (PR #1918)
  • Updated the AsyncAPI Python Template tutorial to support AsyncAPI v3, which is now referenced in the official documentation (PR #1826). The contribution was appreciated by the maintainer, who asked me to replace his repo link with my implementation, which is now referenced in the official documentation.

2. Introducing JIVA (Project Summary)

Joplin users often store notes, clipped webpages, documents, research material, and other useful information in it. Over time, many users accumulate hundreds or even thousands of notes, forming a carefully curated collection of knowledge. When users want to understand a topic that is spread across multiple notes, they usually have to search manually and read through several documents to find the information.

This project proposes JIVA (Joplin Intelligent Virtual Assistant), a plugin that will allow users to chat with their Joplin notes. Having a plugin to chat with the notes is a feature which has been demanded several times in the forum. Also when I started taking notes on Joplin, I always thought how cool it will be to chat with my notes.

Why a Plugin Instead of an External Application

While working on this idea, one of the first decisions I had to make was whether the project should be built as a plugin inside Joplin or as a separate external application that connects to Joplin.

After exploring both possibilities, I chose it to be a plugin because

  • Direct access to notes :– A plugin can use Joplin’s plugin API to access the user’s notes directly, which simplifies indexing and processing the note collection.
  • Better user experience :– Since the assistant runs inside Joplin, users can ask questions without leaving the application and easily open the notes referenced in the answers.
  • Simpler setup :– Users can install the plugin directly from the Joplin plugin ecosystem without needing to configure external services.

3. Technical Approach

3.1 Cloud Models or Local Models

Which type of LLMs JIVA should use. There are two main options: cloud-based APIs (like OpenAI, Anthropic, or Google models) and local models.

While reading discussions on the Joplin forum, I also noticed that some users were concerned about API costs. Because of that, I looked into Google’s Gemini models. The Gemini Flash-Lite variant is entirely free to query within rate limits, which makes it an attractive option.

The best approach according to me is a hybrid way, the system will support multiple LLM providers. By default, users can connect to cloud APIs such as OpenAI or Gemini for high-quality responses, but the plugin will also support running local models through tools like Ollama for users who prefer an offline setup (Jarvis also provide multiple cloud and local models).

3.2 Embedding Model Comparison

While choosing the Embedding Model I evaluated the model based on Massive Text Embedding Benchmark (MTEB). This benchmark measures how well models perform across tasks like retrieval, clustering, classification, and semantic similarity. For JIVA, the retrieval score is the most relevant metric.

Model Dimensions Size MTEB Retrieval CPU Inference Deployment Plugin Fit
text-embedding-3-large 3072 API only ~54.9 ~15ms (API) Cloud API High
text-embedding-3-small 1536 API only ~51.7 ~12ms (API) Cloud API High
BGE-large-en-v1.5 1024 335 MB ~54.3 ~200ms Local ONNX Moderate
BGE-small-en-v1.5 384 24 MB ~51.7 ~30ms Local ONNX/JS Best (my recommendation)
all-MiniLM-L6-v2 384 23 MB ~49.1 ~25ms Local ONNX/JS Good

Based on the comparison, BGE-small-en-v1.5 became the most practical and obvious option. It achieved a 51.7 retrieval score on the MTEB benchmark while being only about 24 MB in size. That is only a few points behind much larger models but significantly smaller, which makes it easier to run inside a plugin.

Edit: I tested the embedding model based on the feedback given by Bill (other mentor)

I tested 2 models( all-MiniLM-L6-v2 and BGE-small-en-v1.5) by running the transformer.js v-3 in the plugin sandbox. So my main finding is that BGE-small is very slow it take twice the time to embed the same notes compared to the all-MiniLM-L6. So my final decision for the embedding model is all-MiniLM-L6-v2.
Link : Forum discussion
GitHub repo link: testing-embedding-model

3.3 Embedding and Indexing Strategy

Embedding an entire note as a single vector doesn’t work well. A long note may contain multiple topics, and compressing everything into one vector often hides the specific information a user might be asking for.

The solution established by the RAG paper and used in every production system since then is to split notes into smaller chunks, embed each one separately, and retrieve the information at the chunk level rather than the note level.

Three chunking strategies

  • Heading-based chunking: Split the note at Markdown heading boundaries (H1, H2, H3). This is exactly what Jarvis did as told by Alondmnt (creator of Jarvis) in the forum. It has only one problem that it doesn't work well for notes without headings.
  • Fixed-size sliding window: Divide text into chunks of fixed length with a small overlap. This works for any note structure but may split sentences.
  • Semantic segmentation: Detect topic changes using similarity between sentences. This is more advanced but adds extra computation and complexity.

All the three strategies has limitations, in all the strategies I need to decide the chunk size at the time of indexing (before knowing the query).
For example: Some questions are very small, like finding a single fact. For that, a big chunk (like 400 tokens) is too much. But for bigger questions, that same 400 tokens might actually be too small. So one fixed size doesn’t really work for everything.

It can be solved by dsRAG. Instead of trying to guess the perfect chunk size, it keeps chunks small and then adjusts things later during retrieval. dsRAG uses Relevant Segment Extraction, which instead of returning fixed chunks, it combines multiple small chunks into a larger segment depending on the query. So the size becomes dynamic.

On the KITE benchmark, RSE alone improved retrieval scores from 4.72 to 6.73 compared to the top-k. When combined with Contextual Chunk Headers, the score reached 8.42.

Because of this, in JIVA I decided not to go with large chunks. Instead, I keep chunks small, around 100–150 tokens.

Steps:

  • First, I still split by headings, so the structure of notes is preserved
  • Then inside each section, I break text into smaller chunks
  • I make sure chunks don’t cut in the middle of a sentence
  • If a chunk is too small (under ~50 tokens), I just merge it with the next one

I am keeping them small because RSE will combine them later. So if a question needs more context, it can join 2–3 chunks. If it needs less, it can just use one.

Before sending a chunk to the embedding model, I prepend it with the note title and heading path (called Contextual Chunk Headers or CCH). So instead of just the chunk text, it looks something like:

[My college notes > 3rd year > Operating system]
chunk comes here...

Like this, the embedding knows both the text and its context.

In my evaluation (section 3.9), CCH didn't help on 30 notes because the topics were different enough already. But I think it will matter at scale.

3.4 Vector Storage Options

I evaluated several vector database(to store the embeddings and chunks) options, including FAISS, Chroma, Weaviate, and Vectra (below are my findings). These systems are powerful but require running a dedicated service.

Store Type Plugin Deploy Scale Metadata Filtering Verdict
In-memory + JSON Custom / DIY Trivial <50K vecs Manual Good fallback
sqlite-vec SQLite extension (npm) Yes (npm pkg) <500K vecs Full SQL WHERE My recommendation
Vectra (npm) File-backed JSON Pure JS <100K vecs Object filters Good alternative
FAISS (faiss-node) C++ native library Complex native build Millions None built-in Overkill
Chroma Python server Needs Python runtime <1M vecs Metadata dict Too heavy
Qdrant / Weaviate Rust/Go server Requires Docker Billions Very rich Way overkill

Based on my calculation the best option for storing embeddings is sqlite-vec, a SQLite extension built for vector similarity search. Since Joplin already uses SQLite internally, this will integrate naturally with the plugin environment and avoids the need for external services.

Edit:
Changes based on the feedback
I tested sqlite-vec in a plugin environment and It failed (the native C extension path via loadExtension() does not work in Joplin's sandbox).Then I started looking for any other alternative I found Voy (A WASM vector similarity search written in Rust and compiled to WASM). I tried testing it on the plugin sandbox and it works perfectly. It uses a k-d tree for indexing and has zero native dependencies. I tested it inside the plugin sandbox and it works perfectly. I was able to create an index, add embeddings, and run NN queries without any issues.

But when I was benchmarking Voy and Vectra, Voy does not work properly and shows errors(I was testing in nodejs environment) but it was working perfectly on the plugin sandbox. So for simplicity I will for now use Vectra (as it is proven to work perfectly). Will give voy another try during the community bonding period.
Link : Forum discussion

3.5 Retrieval Architecture (RAG)

JIVA will use Retrieval-Augmented Generation (RAG), an approach where it will first retrieve relevant information from the user’s notes and then uses a language model to generate the final answer based on that question asked by the user.

The process follows these steps:

For users who want higher quality answers, JIVA also supports an optional reranking step after RSE, where a cross-encoder model re-scores the retrieved segments by reading the query and each segment together before passing them to the LLM.

3.6 UI Integration in Joplin

How users will interact with JIVA. Joplin's Plugin API offers three UI primitives:

Each serves a different interaction pattern.

I decided to go with the persistent chat panel. Dialogs will be good for one time queries but they are terrible for multi-turn conversation (seeing the history is not possible).

How the panel IPC works

joplin.views.panels.create() instantiates the panel. The plugin loads HTML/JS into it via setHtml(). The panel renders in the sidebar on desktop. Two-way communication between the webview and plugin uses asynchronous message passing:

The panel-based UI approach is also something I have used while learning Joplin plugin development. I built a simple Table of Contents plugin based on the docs. Getting the panel to update dynamically as the note changed took a bit of experimentation, but once it worked it helped me understand how plugins interact with the Joplin UI.

3.7 Privacy and Security

Since privacy is a core principle of Joplin, JIVA will be designed to be private by default. Most processing will happen locally on the user’s device, and both the embeddings and vector index will remain on the device. This approach is similar to how the Jarvis Plugin for Joplin performs semantic search locally.

If users choose to run a local model (for example through Ollama), the entire query process can work completely offline. For users who prefer cloud APIs, the plugin will only send the user’s question and a few relevant note excerpts only.

Users will have full control over these settings. Cloud APIs will require explicit permission and personal API keys, and users will be able to disable cloud features or restrict which notebooks the assistant can access.

3.8 Final Architecture Design

I also looked how other note taking tools approach AI/ML integration. For example, Notion AI and Mem.ai uses cloud based systems where indexing and retrieval happen on external servers. Whereas, Obsidian Smart Connections and Logseq AI follow a more local (first) approach.

Within Joplin, Jarvis already handles local and cloud beautifully (Allowing user to choose). JIVA will follow a similar idea like local embeddings and vector storage by default, with optional support for multiple LLM providers.

Final Architecture:

Component Final Choice
LLM: Default (cloud) OpenAI GPT-4o / Gemini Flash
LLM: Local option Ollama (LLaMA 3.1 / Mistral)
Embedding all-MiniLM-L6-v2 via ONNX
Chunking strategy Heading-first + atomic 100–150 token chunks with CCH
Vector store Vectra (primary) + in-memory fallback
Retrieval pipeline vector search (Vectra) + BM25 → RRF → RSE → LLM
Reranker (optional) Xenova/ms-marco-MiniLM-L6-v2 via Transformers.js ONNX
UI Panel webview (React + IPC)
Privacy default Local-first approach and consent for cloud calls

3.9 Retrieval Quality Evaluation

The dsRAG benchmarks are on their own test data. So I wanted to check, does this pipeline actually work on Joplin-style notes(as told by Shikuz). To test this I built a small evaluation harness. I created 30 notes that look like real Joplin notes containing meeting notes, tech docs, study notes, recipes, TODO lists, clipped articles. Then I wrote 32 questions against those notes.

The harness runs retrieval across 7 different pipeline configurations and measures Hit Rate@k and MRR (Mean Reciprocal Rank).

Configuration HR@1 HR@3 HR@5 MRR
vector-only 0.97 1.00 1.00 0.98
vector+CCH 0.97 1.00 1.00 0.98
bm25-only 0.80 0.87 0.93 0.85
hybrid-RRF 0.90 0.97 1.00 0.93
hybrid-RRF+RSE 0.87 0.97 1.00 0.91
vector+RSE 0.87 1.00 1.00 0.92
hybrid+RSE (no CCH) 0.90 1.00 1.00 0.94

My findings:

  • Vector search alone is already really strong on a small corpus. It gets the right note at rank 1 almost every time.
  • BM25 is weaker (0.85 MRR). My implementation doesn't have stemming, so "blocking" doesn't match "blocked" due to this a lot of the similar word missed.
  • Hybrid RRF with equal weights actually pulls the score down a bit. That tells me I need to give more weight to vector search and use BM25 more as a safety net.
  • CCH and RSE don't help much. With only 30 notes and 98 chunks, there isn't enough overlap between topics for them to make a difference. These components are designed for larger collections where many notes cover similar topics.

There was one question that failed across all configs (It needed info from 3 different notes). That's exactly the kind of question where a bigger corpus and better retrieval will help.

During the community bonding period I'll expand the corpus to 100+ notes with overlapping topics and harder questions.
GitHub link: retrieval-eval

Problem which I might face:

While working on this project, there are a few challenges that I may encounter:

  1. Plugin environment compatibility: Running ML libraries inside JIVA (which uses Electron and Webpack) may require additional configuration, especially for packages that depend on WebAssembly or native files. I will validate the embedding model integration early to ensure it works correctly in the plugin environment. This was the problem which occurred during the HahaBills project.

  2. Initial model download: The embedding model will be downloaded the first time JIVA runs. If the download fails or the user is offline, it may affect the setup experience. To address this, I will implement clear progress indicators, retries, and fallback options where possible. As the modal is very small I will most probably not face this.


4. Implementation Plan

This section explains how JIVA will work eventually from reading notes to generating the final answer.

4.1 Note Ingestion and Chunking

4.1.1 Reading Notes from Joplin

The indexer will fetch notes through the Joplin Data API in paginated batches of 50, will request only the fields needed to keep memory usage low:

Change detection: each note's body is hashed (SHA-256, first 16 bytes) and stored in the index_state table. On subsequent runs the hash is compared, if it matches, the note is skipped. If it differs, all existing chunks for that note are deleted and re-ingested. This prevents re-embedding the entire collection on every run.

4.1.2 Chunking Algorithm

The chunking module converts each note into smaller text segments that can be embedded and retrieved efficiently. The implementation will follow the atomic chunking strategy described in section 3.3 heading first split, then 100–150 token atomic chunks with sentence boundaries respected, with CCH prepended before embedding.

Each chunk will store some metadata to help with retrieval and citations. This includes the note ID to reference the original note, the note title for source citations, the chunk index to track its position within the note, the heading path (for example, “Setup > Installation”) to show the exact section, and the chunk text, which is the content sent to the embedding model and LLM.

4.2 Vector Storage and Retrieval

Embeddings are stored using Vectra, a file-backed vector store that lives in the plugin's data directory and is never synced to Joplin's main database. Three data structures will make up the schema:

  • chunks will store text and metadata for every chunk produced by the ingestion pipeline.
  • embeddings: the Vectra virtual table storing 384-dimensional float32 vectors with ANN search.
  • index_state: tracks each note's hash and last-indexed timestamp for incremental updates.

The chunk_index field is actually important. It keeps track of the order of chunks inside a note. Because of this, RSE can figure out which chunks are next to each other and treat them as a continuous piece instead of random fragments.

4.2.1 Relevant Segment Extraction (RSE)

In RAG, we just take the top-k chunks and send them to the model. But this has a problem, even if two chunks are right next to each other in the same note, they are treated separately. So the connection between them is kind of lost.

RSE fixes this by working at the segment level instead of the chunk level. Rather than returning individual chunks, it finds the best contiguous segment of chunks from each note and returns that as a single piece of context.

The process:


Best thing here is that context size is no longer fixed. It depends on the query. The model gets proper, connected passages instead of broken pieces, which usually gives much better answers.

In my testing on 30 notes, RSE didn't show clear improvement. The notes were short (around 3 chunks each), so regular top-k already gets most of the content. RSE is built for longer notes where the answer spans multiple chunks. I'll tune the parameters on a larger corpus during the coding period.

4.2.2 Hybrid Search

Only using vector search works well for meaning, but it can miss exact terms. Things like names, tools, or specific words don’t always match well in embeddings. To fix this, I also plan to use Joplin’s built in search (BM25), it’s good at exact matches as it uses keywords.

So I plan to use two searches run together:

  • Vector search (finds similar meaning)
  • BM25 search (finds exact word matches)

I can combine both results using [Reciprocal Rank Fusion]

It can be done by looking at positions in both lists. If something appears in both, it gets pushed higher. If it appears in only one, it still shows up but a bit lower. After merging, the results are passed to RSE, which then builds proper segments.

One thing I found while testing: if one retriever is much stronger than the other, equal-weight RRF can actually hurt. Vector search alone got 0.98 MRR in my evaluation, but after merging with BM25 (0.85 MRR) it dropped to 0.93. So in JIVA I'll use weighted fusion where vector gets more weight and BM25 acts as a backup for exact keyword matches.

4.2.3 Reranking (Optional)

After RSE gives the top segments, there’s still one small issue. The ranking is based on similarity and keywords, but it doesn’t really check if a segment actually answers the question. To fix this, I will use a reranker (cross-encoder). It reads the query and each segment together and scores how well they match. So instead of just “similar text”, it checks “does this actually answer the question”.

I’m using ms-marco-MiniLM-L-6-v2, which runs locally using Transformers.js (same setup as embeddings). It’s around 80 MB and takes about 100–200ms, which feels acceptable.

I plan to keep it optional and It will be off by default(for faster response). If someone wants better answers, they can turn it on.

One problem is that this model is trained on web data. So sometimes it might prefer clean, well-written text over rough personal notes, even if the rough ones are more useful. I will test this more with real notes during the community bonding period and discuss with the mentor.

Query decomposition (if a question has multiple parts, we break it into smaller queries and search for each one separately) gives better results but I'm not sure I can complete this in the given timeframe. I want to first make sure the core pipeline is stable, and then build this on top of it.
Can discuss on this with the mentor and make changes.

4.3 Prompt Construction

The prompt builder will assembles three layers into the message array sent to the LLM:

  • System instruction: defines the assistant's role and the grounding constraint (answer only from the notes provided).
  • Retrieved note excerpts: formatted with XML tags including title, heading path, and tags.
  • Conversation history: generally the last 5 turns (configurable) and trimmed to oldest-first when the total token budget would be exceeded.

4.4 Chat Panel Interface

I will make a UI like chat gpt where users can interact with JIVA through a sidebar chat panel implemented using the Joplin panel API. The webview loads app.tsx, a React application that renders the conversation UI and communicates with the plugin backend through bidirectional IPC.

IPC Message Protocol
Communication between the panel UI and the plugin backend happens through Joplin’s panel messaging system ( postMessage / onMessage) provided by the Joplin Plugin API.

Direction type field Payload fields Description
Panel → Plugin query text, filterNotebook User submits a question
Plugin → Panel token text (partial token) Streaming response chunk
Plugin → Panel answer text, sources[] Final answer with citation list
Plugin → Panel error message, code LLM failure, index empty, etc.
Panel → Plugin openNote noteId Open the referenced note

Messages are sent from the panel to JIVA's backend using the panel messaging system, and responses are streamed back to the interface.

4.5 Indexing Lifecycle

When we start the plugin the indexing system runs continuously in the background. When JIVA runs for the first time, it will build an index of all notes. After that, only modified notes are reprocessed.

JIVA will also listen for note change events. When a note is edited or deleted, the index is automatically updated to stay consistent with the user's collection.

Users can also manually trigger a full re-index from JIVA’s settings.

4.6 Error Handling

JIVA will also include several safeguards to handle common failures.

Failure Scenario Detection Behaviour
ONNX model download fails Model fetch error Show error message and allow cloud embeddings
Invalid API key / quota exceeded API returns 401 or 429 Show error and guide user to fix API key
Ollama not running testConnection() returns false Show 'Ollama is not reachable' with setup instructions
Empty index (no notes yet) Vector search returns 0 rows Return a friendly 'Please index your notes first' message

4.7 First-Run Experience

Following the saying "First impression is the last impression" JIVA will keep things simple at the start. It will not ask users to set up too many things before it works.

When the user opens it for the first time, they see a welcome screen that guides them to go to Tools → Options → Plugins, where they can set things up based on what they need.

In this section, the users can choose:

  • which provider to use (OpenAI, Gemini, or Ollama)
  • enter API keys if needed
  • or connect to a local setup

Indexing (runs automatically)

As soon as the plugin is installed, indexing starts in the background.
Since all-MiniLM runs locally, no API key is needed for this part. The user doesn’t have to wait they can already explore the UI.

There’s a simple progress bar showing how many notes are left. For around 500 notes, it usually takes 3-4 minutes.

If the user tries to ask something before it’s done, JIVA just shows a friendly message like:
“Still indexing… X notes remaining”


Project Timeline

Community Bonding

I will go in depth of plugin development and go through the source code of various plugins. Any scope adjustments based on mentor feedback will also be discussed during this period.

Week 1–2 (Infrastructure and Basic UI)

I will focus on setting up the plugin foundation and getting a basic working interface in place. This includes creating the plugin scaffold, implementing the settings page, and integrating the Joplin Data API for note retrieval. At the same time, I will set up vectra with the initial database schema and write the core store helper functions. A basic chat panel UI will also be built so there is a visible interface to work with from early on.

Outcome: Plugin starts with a basic chat panel and notes are fetched correctly and database initializes correctly.

Week 3–4 (Indexing System)

These two weeks will focus on building the note ingestion and indexing pipeline using the chunking and embedding approach described in the technical design. The full pipeline will be assembled and tested end-to-end, and incremental change detection will be added so only modified notes are re-indexed in later runs.

Outcome: All notes are chunked, embedded, and queryable in vectra.

Week 5–6 (RSE and Retrieval System)
I will focus on RSE implementing the ANN candidate pool, score grouping by note using chunk_index, and the optimal segment formula with configurable penalty. And hybrid search integrating Joplin's BM25 search via joplin.data.get(['search']), mapping note-level results back to chunks, and merging both lists using RRF.

Outcome: Full retrieval pipeline working (hybrid search → RRF → RSE) and LLM providers connected.

Midterm Evaluation (July 7, 2026)

Week 7–8 (Chat Interface and Streaming)

I will wire the full pipeline end-to-end and test it with real API keys against a Joplin notebook. The SourceCard component will be built so cited notes are clickable. I will also add streaming responses so answers appear token by token, adding multi-turn conversation with rolling history, and polishing the UI with loading indicators, error states, dark mode support, and Joplin theme integration.

Outcome: Full working pipeline with a polished chat interface, streaming responses, multi-turn conversation.

Week 9–10 (Local Models and Incremental Updates)

I will add Ollama support so the plugin can work fully offline with local models. Graceful error handling will be added for common failures like API errors, rate limits, and an empty index. I will also wire up live re-indexing so the index stays in sync as notes are edited, and add tag and notebook filtering to retrieval queries. Performance will be tested on a large note collection and optimized where needed. If time allows, the optional reranker using ms-marco-MiniLM-L-6-v2 will also be integrated in this phase as a settings toggle.

Outcome: JIVA works fully offline with local models and stays in sync as notes are edited.

Week 11–12 (Testing, Documentation and Release)

The final two weeks will focus on making the plugin stable and ready for users. I will write integration tests for the full pipeline, test the plugin on Windows, macOS, and Linux, and handle edge cases like very large notes or notes with little text. User documentation will be written covering installation, configuration, and provider setup, followed by code cleanup before submission.

Outcome: JIVA will be ready for release with all features implemented and documented.

Final Evaluation (August 24, 2026)


5. Deliverables

  • A Joplin plugin (JIVA) that lets users chat with their notes directly inside the app.
  • A sidebar chat UI where users can ask questions and get answers with clickable sources.
  • A search system that uses both meaning (vector search) and exact words (BM25), so results are more reliable.
  • RSE (Relevant Segment Extraction) to return proper sections of notes instead of random small chunks.
  • Local embeddings by default using all-MiniLM-L6-v2 with Contextual Chunk Headers (CCH), so indexing happens entirely on the user's device with no data sent externally.
  • Multiple model support OpenAI, Gemini (including free tier), and Ollama for fully offline use.
  • Clickable citations in answers so users can jump to the exact note.
  • Automatic re-indexing when notes are updated or deleted.
  • An optional reranker for better answers (can be turned on if needed).

6. Availability

Weekly availability: I can dedicate 40–50 hours per week during GSoC and am available for meetings or check-ins on weekends if needed.
Time zone: I am in IST (Indian Standard Time) and flexible with scheduling calls or discussions.
Other commitments: I have my end-semester exams from May 1st to May 15th, which coincides with the community bonding period. During this time, I will be able to commit 3–4 hours per day to the project.
Communication Plan: Weekly async progress report posted to the Joplin forum thread.

AI Assistance Disclosure
I used AI to help with grammar and wording while writing this proposal. The technical content, architecture decisions, and code are all my own. I also used it to go through a lot of research material.

1 Like

Hello, we've recently updated the template for GSoC draft proposals. Please update your post as described here:

1 Like

I have updated my proposal as stated, and looking for some feedback on my proposal.

Hey @shikuz I have addressed all the points you mentioned in this comment. Would love to hear what you think about the approach I used.

Hey @Harsh16gupta, thanks for the update and for referencing the scoping discussion.

A few questions. sqlite-vec is a C extension - have you tested whether it loads in Joplin's plugin sandbox? If not, how complete is your in-memory fallback? The AI summarisation plugin (by @HahaBill) is relevant for how it handled WASM/ONNX loading.

How does your hybrid search work in practice? Does it call Joplin's search API for keyword results and fuse them with vector results via RRF, or are you implementing BM25 separately?

The dsRAG benchmarks are on their test data. Do you have a plan for testing retrieval quality on real Joplin notes, even informally?

Thankyou for the review:

I made a mistake. I only checked the efficiency and the speed of sqlite-vec but I did not actually test it inside the plugin sandbox. After your response I tried testing it in a plugin environment and It failed (the native C extension path via loadExtension() does not work in Joplin's sandboxed).
then I started look for any other alternative I found Voy (A WASM vector similarity search written in Rust and compiled to WASM). I tried testing it on the plugin sandbox and it works perfectly. It uses a k-d tree for indexing and has zero native dependencies. I tested it inside the plugin sandbox and it works perfectly. I was able to create an index, add embeddings, and run NN queries without any issues.

My updated plan is:

  • Primary: Use Voy for vector similarity search. It runs in any JavaScript context including the sandboxed renderer, installs via npm with no native build step, and I have already verified it works inside a Joplin plugin.
  • Fallback: will use Vectra which handles up to ~100K vectors with zero native dependencies, or a simple in-memory brute-force cosine similarity search for smaller collections.

I am currently benchmarking Voy against Vectra on larger datasets (1K, 3K, and 5K simulated notes) to measure index build time, query latency, and memory usage. Will post the result, when done. How much I have read so far, Voy will perform better on large note collection (fast). What do you think about using Voy?

Edit: When I was benchmarking Voy and Vectra, Voy does not work properly and shows error(I was testing in nodejs environment) but it was working perfectly on the plugin sandbox. So for simplicity I will for now use Vectra (as it is proven to work perfectly). will give voy another try during community bonding period.

Yes it calls Joplin's search API and I will not implement BM25 separately. The hybrid search works as follows:

note: will update the sqlite-vec after testing Voy properly.

Yes, I mentioned the dsRAG/KITE score only to justify the RSE approach. I do not expect the same absolute numbers on Joplin notes will test the retrieval quality on real notes. Planning to do it this week.

  1. Will test on 50 real Joplin notes(will ask users on forum to share some if they are willing to share) across different categories (personal notes, clipped articles, technical docs, meeting notes).
  2. Create more than 30 question-answer pairs where each question requires information from 1-3 specific notes.
  3. I will measure two standard information retrieval metrics:
  • Hit Rate@k: to check did the correct note appear in top-kresults?
    Hit Rate@K = (number of questions where the correct note appeared in top K) / (total questions)This will tell me whether the pipeline is finding the right notes or not.
  • Mean Reciprocal Rank (MRR): to find out how high was the first correct result ranked.For each question,
    the reciprocal rank = 1 / position_of_first_correct_result. MRR is the average reciprocal rank of all the questions. This will tell me where the right answer is(Higher MRR means system is working good).
  1. To verify that each component of the retrieval pipeline is actually contributing, I will run all 30 questions through different configurations(vector-only vs BM25-only vs hybrid (RRF) vs hybrid+RSE, with and without CCH) and compare scores.

If RSE is working and helping then I will see a clear improvement in Hit Rate and MRR when it is turned on or off. Same for CCH. If a component does not improve scores, I will either remove it or investigate why.
Will share results when completed.

Good testing work on the sandbox constraints. Voy is worth exploring during community bonding. Vectra as default for now makes sense.

Mobile isn't in scope for v1, but does your architecture have a path to it, or do current choices close it off?

I did got through it and ran the embedding model (BGE-small-en-v1.5 and all-MiniLM-L6-v2) with transformer.js in the plugin environment.

My findings: BGE-small takes roughly twice as long to embed the same notes compared to all-MiniLM-L6-v2. Both models share the same max token limit (512)(this one is little confusing MiniLM no proper mention of token limit ) and output dimension (384), so the quality loss is minimal while the speed gain is very large.

I have added all the images of the results in the GitHub repo readme: testing-embedding-model

I have explained it in more detail: here

The architecture does not close it off, but there is one thing that would need to change, the vector storage. I am currently using Vectra, which stores embeddings as local files on disk. Mobile plugins don't have filesystem access, so that would break. To fix this I can use joplin's built-in API, that stores data directly on notes.

Other barier:
Ollama will not work on mobile but openAI and Gemini will work perfectly.

1 Like

Good follow-up on the AI summarisation plugin reference. Running both models via Web Worker at different scales is useful data. One thing that jumped out: MiniLM's per-note time stays flat as the corpus grows, but BGE-small's doesn't scale linearly, climbing from ~486ms at 125 notes to ~760ms at 1500.

Good luck with the submission!

1 Like