GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Vijay_Singh

Links

Project Idea - Link

GitHub profile - vijaysingh2219

Forum introduction post - Link

Pull requests you have submitted to Joplin
Currently none. I am preparing to contribute improvements related to plugin tooling and testing as I continue engaging with the community.

1. Introduction

I am Vijay Singh, a final year Computer Science and Engineering student at Graphic Era University, Dehradun, India.

Over the past two years, I have been actively building full-stack and backend systems with a strong focus on scalability, real-time infrastructure, and developer tooling. My experience includes building production-ready monorepo tools, distributed systems, real-time WebSocket architectures, and indexing pipelines.

My technical focus areas include:

  • Backend systems
  • Search and indexing systems
  • Real-time infrastructure
  • Full-stack web applications
  • Scalable system design

I started contributing to open source last year and have been improving backend systems and real-time features across several projects.

Relevant Development Experience

BuildElevate – Monorepo Starter and CLI Tool
A production-grade CLI tool that scaffolds scalable full-stack monorepos with authentication, CI/CD, and distributed architecture.

Key work:

  • Designed scalable project architecture using Turborepo
  • Implemented secure authentication with OAuth and TOTP
  • Built rate-limited APIs using Redis
  • Configured CI/CD pipelines with parallel workflows

PlayChess – Real-time Online Chess Platform
A real-time multiplayer platform with scalable backend infrastructure.

Key work:

  • Server-authoritative gameplay system
  • Distributed game clock with Redis queues
  • Scalable WebSocket architecture
  • Matchmaking and replay system

Additional projects include a full-stack portfolio platform and a real-time chat application with presence tracking and scalable messaging infrastructure.

These experiences helped me develop the backend, indexing, and system design skills needed for building a scalable AI-powered retrieval system inside Joplin.

I chose this project because Joplin is a privacy-focused, open-source note-taking platform with a strong plugin ecosystem. The idea of building an AI-powered interface over personal knowledge bases is both technically interesting and highly useful for users managing large note collections.


2. Project Summary

Joplin users often accumulate thousands of notes over time, making it difficult to quickly locate specific information using traditional keyword search alone.

This project proposes an AI-powered chat interface that allows users to ask natural language questions about their notes and receive answers grounded in their own content.

The plugin will:

  • Index note content locally
  • Retrieve the most relevant passages using hybrid search
  • Send only relevant context to an AI model
  • Generate grounded responses with references to source notes

The expected outcome is a fully functional Joplin plugin that provides:

  • AI chat interface over notes
  • Hybrid semantic and keyword retrieval
  • Incremental indexing for scalability
  • Privacy-first configuration
  • Answer grounding with references

Out of scope

  • Model training
  • Knowledge graph reasoning
  • Multimodal processing
  • Large changes to the Joplin core application

The plugin will be designed so that it can later evolve into a community-supported extension or be integrated into Joplin core if valuable.


3. Technical Approach

System Architecture

The proposal follows a hybrid Retrieval-Augmented Generation architecture.

Joplin notes are retrieved through the plugin API, processed into chunks, embedded locally, and stored in an index that supports semantic retrieval combined with keyword filtering.

This approach fits naturally with the Joplin plugin architecture and allows efficient querying over large note collections.


How the plugin integrates with Joplin

The plugin will use the Joplin plugin API for:

  • Reading notes and metadata
  • Displaying a chat interface inside a panel
  • Registering commands to launch chat
  • Storing configuration settings
  • Tracking note updates

The implementation is plugin-first and does not require modifications to Joplin core for the MVP.


Libraries and Technologies

  • TypeScript
  • Joplin Plugin API
  • Vector similarity search or lightweight vector database
  • Markdown parser for chunking
  • Local storage for index metadata
  • Embeddings API or local embedding provider
  • LLM API or local model server
  • Testing framework (Jest or equivalent)

Indexing and Retrieval Design

The plugin will:

  • Fetch notes using the Joplin Data API
  • Split notes into smaller passages
  • Generate embeddings for each chunk
  • Store embeddings locally
  • Use keyword search as a pre-filter
  • Run semantic retrieval on candidate passages
  • Send only top results to the AI model

This hybrid retrieval system improves answer accuracy and avoids sending entire notes to the model.


Incremental Indexing Strategy

To support large note collections efficiently:

  • Track note updated_time
  • Re-embed only modified notes
  • Maintain index metadata store
  • Batch embedding updates
  • Rebuild index only when necessary

This significantly reduces computation and improves performance.


Answer Grounding

Each AI response will include references to the notes used to generate the answer.

This ensures:

  • Transparency
  • Verifiability
  • Trust in AI responses

The chat interface will display:

  • Note title
  • Relevant snippet
  • Link to open the note inside Joplin

This helps reduce hallucinations and ensures answers are grounded in the user's data.


Privacy and Offline Design

Privacy is a core design principle.

The plugin will:

  • Keep indexes stored locally
  • Send minimal context to AI models
  • Support local model providers
  • Provide clear UI controls for data sharing

This aligns with Joplin’s privacy-first philosophy.


Architecture diagram


Potential Challenges

  • Handling very large note collections
  • Maintaining index consistency after edits
  • Managing prompt size limitations
  • Supporting multiple AI providers
  • Ensuring good performance across systems

Performance Expectations

Initial indexing target:

5,000 notes under 10 minutes depending on embedding provider

Query response time:

  • 1–3 seconds retrieval
  • 2–5 seconds answer generation

Memory usage will be optimized through batching and lazy loading.


Testing Strategy

Testing will focus on:

  • Chunking correctness
  • Incremental indexing
  • Retrieval accuracy
  • Error handling
  • UI interactions
  • Search pipeline stability

Evaluation queries and datasets will be shared with mentors during development.


Documentation Plan

Documentation will include:

  • Installation guide
  • Configuration instructions
  • Indexing behavior explanation
  • Privacy model
  • Developer notes for extension

4. Implementation Plan

Community Bonding Period

  • Finalize architecture with mentors
  • Set up plugin development environment
  • Confirm retrieval and indexing design
  • Prepare evaluation dataset

Week 1–2

  • Implement plugin structure
  • Create chat UI panel
  • Connect to Joplin note retrieval

Week 3–4

  • Implement note chunking
  • Build indexing system
  • Generate embeddings
  • Store metadata

Week 5–6

  • Implement retrieval pipeline
  • Add keyword filtering
  • Combine semantic and keyword ranking

Midterm milestone

  • Working indexing system
  • Retrieval pipeline functional
  • Basic chat interface working

Week 7–8

  • Integrate LLM responses
  • Build prompt construction
  • Add source references
  • Improve UI feedback

Week 9–10

  • Add privacy controls
  • Optimize indexing updates
  • Improve performance

Week 11–12

  • Finalize testing
  • Improve UI and stability
  • Complete documentation

Final milestone

  • Fully functional plugin with AI chat over notes
  • Hybrid retrieval system
  • Incremental indexing
  • Documentation and tests completed

5. Deliverables

  • AI chat interface plugin for Joplin
  • Hybrid retrieval system for notes
  • Local semantic search index
  • Incremental indexing pipeline
  • Automated tests
  • User and developer documentation

6. Availability

I will be available approximately 40 hours per week during GSoC.

Timezone: IST (India)

Typical working hours:

  • 15:30–20:30 IST
  • 21:00–04:00 IST
  • 10:00–15:00 UTC
  • 15:30–22:30 UTC

I have no major commitments during the summer.

My college exams will take place over four days within a two-week period in May, each lasting approximately 3–5 hours. This will not significantly impact my availability.


Summary

I am excited about contributing to Joplin by building a privacy-first AI interface that allows users to interact with their personal knowledge base more effectively. This project combines scalable retrieval systems, practical AI integration, and Joplin’s extensible plugin ecosystem.

I look forward to collaborating with mentors and the community to deliver a high-quality and useful plugin.