GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Payel Manna

Links

1.Introduction

Background / Studies

I am a Computer Science student at Scaler School Of Technology, Bengaluru, India. My academic background includes operating systems, computer networking, classical machine learning, and databases (MySQL, PostgreSQL)

Programming Experience

I primarily work with TypeScript, JavaScript, React, Node.js, Python, and Java.

Some relevant projects:

  • EduCompanion — Full-stack MERN application with API design and database integration
  • LearnFlow — React-based frontend demonstrating component architecture and UI design
  • TrueTribe — Python/Flask backend system

I have also completed a Coursera certification in Agentic AI, covering:

  • LLM orchestration
  • Memory systems
  • RAG pipelines

Open source experience: My Joplin contribution is PR #14779, which adds a real-time search field to the desktop Config screen. This gave me direct experience with Joplin's React component architecture, config-shared.js, the SearchInput API, ESLint conventions, CI pipeline (12 checks across Ubuntu, macOS, Windows), and the Playwright integration test suite. I also have two Hacktoberfest merged PRs. Before writing this proposal I studied the Jarvis plugin source to understand panel IPC and the constraints of the plugin sandbox.I have already contributed to Joplin and understand its plugin architecture, which allows me to start implementation immediately without a ramp-up phase.

2. Project Summary

The Problem

Joplin users often build large, carefully curated knowledge bases containing hundreds or even thousands of notes. However, the current keyword-based search is limited — it cannot handle semantic queries such as:

“Summarise my notes on distributed systems”
“What did I write about async patterns last month?”

It also cannot surface connections across multiple notes. As a result, users must manually browse and recall where information is stored — a process that becomes increasingly inefficient as the collection grows.

What Will Be Built

I propose to build ARIA (Adaptive Retrieval and Intelligence Assistant) — a Joplin plugin that enables users to interact with their notes through a ChatGPT-style sidebar interface.

With ARIA:

  • The user asks a natural language question
  • The system retrieves relevant notes
  • Synthesises grounded answers across multiple notes with source citations

The system supports multi-turn conversations, allowing users to ask follow-up questions while continuously improving based on prior interactions.

What Makes This Different from Basic RAG

ARIA goes beyond a simple retrieve → generate pipeline by introducing three key innovations:

  • Persistent Context Memory
    Remembers past sessions, learns user preferences, and builds a topic graph over time

  • Multi-Stage Retrieval Pipeline
    Combines BM25 keyword search with semantic retrieval, followed by context-aware re-ranking

  • Cross-Note Synthesis
    Identifies overlapping topics across notes and structures responses accordingly

Privacy Model

ARIA is designed with a privacy-first architecture:

  • All embeddings and indexing are performed locally on-device
  • Ollama enables fully offline operation
  • OpenAI-compatible APIs are opt-in only
  • Only relevant note excerpts are sent — never the full collection
  • All memory (conversation, user preferences, topic graph) is stored in local SQLite
  • No data is synced to Joplin Cloud

Why a Plugin

A plugin has direct access to joplin.data API without requiring the user to run a separate server or configure OAuth. An external application would require either a local API server or Joplin's Web Clipper service to be running — adding friction and a persistent background process.

The existing Jarvis plugin demonstrates that advanced local AI features are feasible within Joplin’s plugin environment. ARIA builds on this foundation with a more structured and scalable RAG + memory architecture.

3. Technical Approach

Architecture Overview

The system is organised into five layers:

  • Ingestion Layer — Fetches and indexes notes locally from Joplin
  • Storage Layer — Stores embeddings and memory in SQLite
  • Query Layer — Retrieves relevant chunks and interacts with the LLM
  • Memory Layer — Enriches each query with session context and learned signals
  • UI Layer — Streams responses to a React-based sidebar panel

Embedding Model

The system uses BGE-small-en-v1.5 (24 MB, MTEB retrieval ~51.7) via ONNX runtime.

  • Chosen over BGE-large (335 MB, ~54.3) because a ~3 point quality gain does not justify a ~14× increase in size
  • Runs entirely on-device — no external server required
  • Cloud embedding via text-embedding-3-small is supported as opt-in

Chunking Strategy

  • Notes are first split at H1 / H2 / H3 heading boundaries
  • Each section is further divided using a 400-token sliding window with 50-token overlap
  • Sections under 100 tokens are merged with adjacent sections

This hybrid strategy works well for both:

  • Structured notes (with headings)
  • Unstructured notes (free-form text)

Storage Layer

The system uses sqlite-vec for vector similarity search.

  • Fully compatible with existing SQLite usage
  • No external services or processes required

Tables:

  • embeddings — vector representations of chunks
  • chunks — note text with metadata
  • index_state — incremental sync tracking via content hashes
  • memory_conversation, memory_user, memory_topic_graph — persistent memory stores

Native Dependency Risk

sqlite-vec is a native C++ extension and must be validated against Electron’s ABI during community bonding.

  • If incompatible, fallback to vectra (pure JavaScript, file-backed JSON vector store)
  • hnswlib-node is not used as fallback (also native C++ → same risk)

Even in the worst case where ONNX or vector search is unavailable, the system falls back to BM25-based retrieval with LLM summarisation, ensuring a functional feature is always delivered.

RAG Query Flow

  1. Load memory context (conversation history, user knowledge, topic graph)
  2. Embed user query via BGE-small ONNX → 384-dimensional vector
  3. Perform hybrid retrieval:
    • BM25 keyword search
    • Semantic vector search
      → union of ~20 candidates
  4. Apply context-aware re-ranking:
    • Recency
    • Note importance
    • Prior interactions
    • Memory signals
      → select top-5 diverse chunks
  5. Perform cross-note synthesis to detect shared topics and structure context
  6. Stream LLM response → extract key facts → update memory stores

Incremental Delivery Strategy

The system is designed to be delivered in stages to ensure a functional result at all times.

  • By midterm, a complete hybrid RAG pipeline (BM25 + semantic retrieval + answer generation) will be fully operational
  • Memory and synthesis layers are added incrementally on top of this stable core
  • If advanced components are delayed, the system gracefully degrades to a fully working retrieval + summarisation pipeline

This ensures that a usable and valuable feature is always delivered regardless of complexity trade-offs.

Memory Layer

Three SQLite-backed memory stores:

  • memory_conversation
    Stores past Q&A and extracted key facts for multi-turn continuity

  • memory_user
    Stores inferred expertise and preferences
    Example:
    "topic:os_scheduling" → "advanced"

  • memory_topic_graph
    Stores relationships between topics discovered across sessions

All memory is:

  • Stored locally
  • User-clearable
  • Never sent to external APIs

Performance Architecture

Targets are based on preliminary benchmarking of BGE-small ONNX during prototype validation.Actual performance may vary based on hardware, these targets represent expected performance on mid-range systems.

  • Indexing runs in a Node.js worker thread → zero UI blocking
  • ONNX batch inference (32 chunks per pass) → ~20× faster than sequential
  • Incremental sync uses SHA-256 hashing → unchanged notes skipped
  • 2-second debounce on note.onChange events
  • Notes fetched in pages of 50 → stable RAM usage

Performance Targets

Metric Target
First-run index (1000 notes) ~40 seconds
Incremental update per note < 1 second
RAM usage (idle) < 150 MB
Retrieval latency < 200 ms
UI thread blocking Zero

Key Challenges and Mitigations

Challenge Mitigation
sqlite-vec ABI mismatch in Electron Validate in bonding period; fallback to vectra (pure JS)
ONNX runtime in plugin sandbox Validate early; fallback to Ollama nomic-embed-text
Context window overflow Token budget — trim oldest turns, then lowest-ranked chunks
Ollama not installed Auto-detect on launch; onboarding guidance
Topic extraction quality Use heading paths (human-curated) + stop-word filtering
Incorrect cross-note associations Conservative thresholds + hedged prompt language

The primary success criteria is a robust and usable RAG pipeline integrated into Joplin.
All advanced features (memory, synthesis, re-ranking improvements) are layered enhancements and will not compromise core delivery.

4. Implementation Plan

Each phase ends with a working, testable feature to ensure continuous integration and feedback.

Community bonding (May 1–26): Validate sqlite-vec and ONNX runtime inside Electron. Build working prototype — index 20 notes, retrieve top-3, confirm vectra fallback. Commit to public GitHub. Align with mentors on scope and priorities.

A minimal prototype validating embedding and retrieval on a small note set will be completed during the community bonding period and shared publicly.

Week 1–2 (May 27–Jun 9) — Foundation: Plugin scaffold, TypeScript build, settings page, SQLite database with all six tables, basic React chat panel with IPC. Deliverable: Plugin installs, database initialises, notes fetch correctly.

Week 3–4 (Jun 10–Jun 23) — Ingestion pipeline: Worker thread indexer, SHA-256 incremental sync, 2-second debounce on note.onChange, heading-first chunking, BGE-small batch ONNX embedding, WAL bulk writes, progress bar in panel. Deliverable: 1000 notes indexed in ~40 seconds, incremental updates under 1 second.

Week 5–6 (Jun 24–Jul 7) — RAG query engine: BM25 index, semantic vector search, hybrid union, basic re-ranking, grounding prompt, OpenAI and Ollama providers with streaming, source citation cards with openNote. Deliverable: End-to-end RAG working, user receives streamed answer with clickable citations.

Week 7–8 (Jul 8–Jul 21) — Memory layer: Conversation memory store, key fact extraction, user knowledge profiling, topic graph construction, memory-enriched prompt assembly, token budget management, memory settings panel. Deliverable: ARIA remembers sessions, adapts to user. Midterm target.

Week 9–10 (Jul 22–Aug 4) — Synthesis, filtering, resilience: Cross-note synthesiser, full re-ranking with all four signals, notebook and tag filtering, note.onDelete handler, full error handling matrix, Joplin theme integration. Deliverable: Second brain synthesis, notebook filtering, all failure cases handled gracefully.

Week 11–12 (Aug 5–Aug 19) — Testing, performance, documentation: Full unit and integration test suite, cross-platform testing (Windows/macOS/Linux), performance benchmarks logged and committed, four user guides (installation, Ollama, OpenAI, privacy FAQ), developer guide, code cleanup, release tag v0.1.0-gsoc. Deliverable: All performance targets met, full test suite passing, plugin documented and released.

The system is designed incrementally — a fully functional hybrid RAG pipeline will be completed by midterm, with memory and synthesis layers added progressively if time permits.

Weekly progress updates and incremental demos will be shared with mentors to ensure alignment and early feedback.

Timeline Summary

Period Dates Key Deliverable
Community Bonding May 1 – May 26 Prototype on GitHub
Week 1–2 May 27 – Jun 9 Plugin foundation
Week 3–4 Jun 10 – Jun 23 Ingestion pipeline
Week 5–6 Jun 24 – Jul 7 RAG query engine
Week 7–8 Jul 8 – Jul 21 Memory layer (midterm)
Week 9–10 Jul 22 – Aug 4 Synthesis + resilience
Week 11–12 Aug 5 – Aug 19 Testing + documentation + release

5. Deliverables

Core (Guaranteed by Midterm )

  • End-to-end RAG pipeline:
    • Note ingestion and chunking
    • Hybrid retrieval (BM25 + semantic search)
    • Context-aware re-ranking
    • Streaming LLM responses with source citations
  • Local-first embedding using BGE-small via ONNX (no external dependency required)
  • Incremental sync using SHA-256 hashing with event-driven updates
  • Support for:
    • Ollama (fully offline)
    • OpenAI-compatible APIs (opt-in)
  • Notebook and tag-based filtering
  • Responsive React sidebar chat panel integrated into Joplin
  • Graceful fallback mechanisms (BM25-only retrieval if embeddings unavailable)

Extended (Planned, Delivered After Core Stabilisation)

  • Persistent context memory:
    • Conversation history
    • User knowledge signals
    • Topic graph
  • Cross-note synthesis highlighting relationships across notes
  • Advanced re-ranking signals (interaction history, memory-aware boosting)

Quality

  • Unit and integration tests covering all core pipeline components
  • Cross-platform validation (Windows, macOS, Linux)
  • Performance benchmarks recorded and documented
  • User documentation:
    • Installation guide
    • Ollama setup
    • OpenAI setup
    • Privacy and data handling FAQ
  • Clean, production-quality code:
    • No any types
    • Full JSDoc coverage
    • Zero ESLint warnings

6. Availability

  • Weekly hours: 40+ hours per week throughout GSoC
  • Time zone: UTC+5:30 (IST, Bengaluru) — available for overlap with UTC and CET mentor windows
  • Exam overlap: Possible conflict with early community bonding (late April–mid May). Minimum 20 hours/week during that period, full time from May 27
  • Communication: Weekly Monday forum update (completed / planned / blockers), daily commits on public fork, available for mentor calls UTC 06:00–14:00 weekdays

Hi,@laurent looking forward for your feedback

The proposal is really large, I can't review this unfortunately. You may want to check the feedback on other similar proposal and refine it based on this

Hey @Payel-Manna, Laurent mentioned the proposal length was a barrier to review. Have you been working on revisions, and how are you thinking about what's core vs. stretch goal?

Your sqlite-vec fallback is hnswlib-node, but that's also a native C++ module. Would it face the same Electron plugin sandbox constraints?

Yes, you are right, hnswlib-node is also a native C++ addon and would face the exact same Electron ABI constraints as sqlite-vec. I should not have listed it as a valid fallback.

The correct pure-JS fallback is vectra => a file-backed JSON vector store with zero native dependencies. For smaller collections it is perfectly usable, and it gives us a guaranteed working baseline while sqlite-vec is being validated in the Electron environment during community bonding.

I am updating the proposal to reflect this. Thank you for catching it.

Hi @laurent, thanks for the earlier feedback.

I've significantly revised the proposal to make it more concise and focused. In particular:

  • Reduced length and removed low-level details
  • Clearly separated core deliverables vs extended features
  • Fixed the native dependency fallback (now using vectra instead of hnswlib-node)

Would really appreciate it if you could take another look when you have time. Thank you!

Thanks @shikuz for pointing out the fallback issue, you’re absolutely right.

I’ve corrected the proposal:

  • Removed hnswlib-node as a fallback (native C++ constraint)
  • Added vectra as the pure JavaScript fallback

I’ve also streamlined the proposal and clearly separated core vs stretch deliverables based on your feedback.

Would really appreciate if you could take another quick look when you have time.

How will you know whether the hybrid retrieval and re-ranking actually improve results over cosine similarity alone?

To check if hybrid retrieval + re-ranking is actually better than just cosine similarity, I plan to compare them in a simple and practical way.

First, I’ll create a small set of test queries with expected relevant notes (from sample note collections).

Then I’ll run both approaches:

  • cosine similarity only
  • hybrid (keyword + vector + re-ranking)

and compare:

  • whether the correct notes are retrieved
  • whether they appear higher in the results

I’ll also check the final answers generated using both methods to see which one is more accurate and better grounded in the notes.

Additionally, I’ll keep track of latency, since re-ranking adds extra cost, to make sure the improvement is actually worth it.

If hybrid retrieval consistently gives better results without too much slowdown, I’ll consider it a meaningful improvement.