GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Vijay_Singh

vijaysingh2219 · 31 March 2026 08:50

Links

Project Idea - Link

GitHub profile - vijaysingh2219

Forum introduction post - Link

Pull requests you have submitted to Joplin
Currently none. I am preparing to contribute improvements related to plugin tooling and testing as I continue engaging with the community.

1. Introduction

I am Vijay Singh, a final year Computer Science and Engineering student at Graphic Era University, Dehradun, India.

Over the past two years, I have been actively building full-stack and backend systems with a strong focus on scalability, real-time infrastructure, and developer tooling. My experience includes building production-ready monorepo tools, distributed systems, real-time WebSocket architectures, and indexing pipelines.

My technical focus areas include:

Backend systems
Search and indexing systems
Real-time infrastructure
Full-stack web applications
Scalable system design

I started contributing to open source last year and have been improving backend systems and real-time features across several projects.

Relevant Development Experience

BuildElevate – Monorepo Starter and CLI Tool
A production-grade CLI tool that scaffolds scalable full-stack monorepos with authentication, CI/CD, and distributed architecture.

Key work:

Designed scalable project architecture using Turborepo
Implemented secure authentication with OAuth and TOTP
Built rate-limited APIs using Redis
Configured CI/CD pipelines with parallel workflows

PlayChess – Real-time Online Chess Platform
A real-time multiplayer platform with scalable backend infrastructure.

Key work:

Server-authoritative gameplay system
Distributed game clock with Redis queues
Scalable WebSocket architecture
Matchmaking and replay system

Additional projects include a full-stack portfolio platform and a real-time chat application with presence tracking and scalable messaging infrastructure.

These experiences helped me develop the backend, indexing, and system design skills needed for building a scalable AI-powered retrieval system inside Joplin.

I chose this project because Joplin is a privacy-focused, open-source note-taking platform with a strong plugin ecosystem. The idea of building an AI-powered interface over personal knowledge bases is both technically interesting and highly useful for users managing large note collections.

2. Project Summary

Joplin users often accumulate thousands of notes over time, making it difficult to quickly locate specific information using traditional keyword search alone.

This project proposes an AI-powered chat interface that allows users to ask natural language questions about their notes and receive answers grounded in their own content.

The plugin will:

Index note content locally
Retrieve the most relevant passages using hybrid search
Send only relevant context to an AI model
Generate grounded responses with references to source notes

The expected outcome is a fully functional Joplin plugin that provides:

AI chat interface over notes
Hybrid semantic and keyword retrieval
Incremental indexing for scalability
Privacy-first configuration
Answer grounding with references

Out of scope

Model training
Knowledge graph reasoning
Multimodal processing
Large changes to the Joplin core application

The plugin will be designed so that it can later evolve into a community-supported extension or be integrated into Joplin core if valuable.

3. Technical Approach

System Architecture

The proposal follows a hybrid Retrieval-Augmented Generation architecture.

Joplin notes are retrieved through the plugin API, processed into chunks, embedded locally, and stored in an index that supports semantic retrieval combined with keyword filtering.

This approach fits naturally with the Joplin plugin architecture and allows efficient querying over large note collections.

How the plugin integrates with Joplin

The plugin will use the Joplin plugin API for:

Reading notes and metadata
Displaying a chat interface inside a panel
Registering commands to launch chat
Storing configuration settings
Tracking note updates

The implementation is plugin-first and does not require modifications to Joplin core for the MVP.

Libraries and Technologies

TypeScript
Joplin Plugin API
Vector similarity search or lightweight vector database
Markdown parser for chunking
Local storage for index metadata
Embeddings API or local embedding provider
LLM API or local model server
Testing framework (Jest or equivalent)

Indexing and Retrieval Design

The plugin will:

Fetch notes using the Joplin Data API
Split notes into smaller passages
Generate embeddings for each chunk
Store embeddings locally
Use keyword search as a pre-filter
Run semantic retrieval on candidate passages
Send only top results to the AI model

This hybrid retrieval system improves answer accuracy and avoids sending entire notes to the model.

Incremental Indexing Strategy

To support large note collections efficiently:

Track note updated_time
Re-embed only modified notes
Maintain index metadata store
Batch embedding updates
Rebuild index only when necessary

This significantly reduces computation and improves performance.

Answer Grounding

Each AI response will include references to the notes used to generate the answer.

This ensures:

Transparency
Verifiability
Trust in AI responses

The chat interface will display:

Note title
Relevant snippet
Link to open the note inside Joplin

This helps reduce hallucinations and ensures answers are grounded in the user's data.

Privacy and Offline Design

Privacy is a core design principle.

The plugin will:

Keep indexes stored locally
Send minimal context to AI models
Support local model providers
Provide clear UI controls for data sharing

This aligns with Joplin’s privacy-first philosophy.

Architecture diagram

Potential Challenges

Handling very large note collections
Maintaining index consistency after edits
Managing prompt size limitations
Supporting multiple AI providers
Ensuring good performance across systems

Performance Expectations

Initial indexing target:

5,000 notes under 10 minutes depending on embedding provider

Query response time:

1–3 seconds retrieval
2–5 seconds answer generation

Memory usage will be optimized through batching and lazy loading.

Testing Strategy

Testing will focus on:

Chunking correctness
Incremental indexing
Retrieval accuracy
Error handling
UI interactions
Search pipeline stability

Evaluation queries and datasets will be shared with mentors during development.

Documentation Plan

Documentation will include:

Installation guide
Configuration instructions
Indexing behavior explanation
Privacy model
Developer notes for extension

4. Implementation Plan

Community Bonding Period

Finalize architecture with mentors
Set up plugin development environment
Confirm retrieval and indexing design
Prepare evaluation dataset

Week 1–2

Implement plugin structure
Create chat UI panel
Connect to Joplin note retrieval

Week 3–4

Implement note chunking
Build indexing system
Generate embeddings
Store metadata

Week 5–6

Implement retrieval pipeline
Add keyword filtering
Combine semantic and keyword ranking

Midterm milestone

Working indexing system
Retrieval pipeline functional
Basic chat interface working

Week 7–8

Integrate LLM responses
Build prompt construction
Add source references
Improve UI feedback

Week 9–10

Add privacy controls
Optimize indexing updates
Improve performance

Week 11–12

Finalize testing
Improve UI and stability
Complete documentation

Final milestone

Fully functional plugin with AI chat over notes
Hybrid retrieval system
Incremental indexing
Documentation and tests completed

5. Deliverables

AI chat interface plugin for Joplin
Hybrid retrieval system for notes
Local semantic search index
Incremental indexing pipeline
Automated tests
User and developer documentation

6. Availability

I will be available approximately 40 hours per week during GSoC.

Timezone: IST (India)

Typical working hours:

15:30–20:30 IST
21:00–04:00 IST
10:00–15:00 UTC
15:30–22:30 UTC

I have no major commitments during the summer.

My college exams will take place over four days within a two-week period in May, each lasting approximately 3–5 hours. This will not significantly impact my availability.

Summary

I am excited about contributing to Joplin by building a privacy-first AI interface that allows users to interact with their personal knowledge base more effectively. This project combines scalable retrieval systems, practical AI integration, and Joplin’s extensible plugin ecosystem.

I look forward to collaborating with mentors and the community to deliver a high-quality and useful plugin.

Topic		Replies	Views
GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI GSoC	0	19	31 March 2026
GSoC Idea Discussion: Chat with your note collection using AI – architecture and LLM approach Development	5	146	13 March 2026
Welcome to GSoC 2026 with Joplin! GSoC	155	1955	1 April 2026
GSoC 2026: Opportunities for the AI projects GSoC	32	697	13 April 2026
Introducing Yaseen - GSoC 2026 Applicant GSoC	0	60	19 February 2026

GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Vijay_Singh

Links

1. Introduction

Relevant Development Experience

2. Project Summary

3. Technical Approach

System Architecture

How the plugin integrates with Joplin

Libraries and Technologies

Indexing and Retrieval Design

Incremental Indexing Strategy

Answer Grounding

Privacy and Offline Design

Architecture diagram

Potential Challenges

Performance Expectations

Testing Strategy

Documentation Plan

4. Implementation Plan

Community Bonding Period

Week 1–2

Week 3–4

Week 5–6

Week 7–8

Week 9–10

Week 11–12

5. Deliverables

6. Availability

Summary

Related topics