GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI – Abdul Rafay

AI Assistance Disclosure

I used Claude AI to help with the wording and structure of this proposal. I reviewed all the output carefully and made sure everything reflects my actual experience and understanding of the project.

Links

Project Idea: https://github.com/joplin/gsoc/blob/master/ideas.md#4-chat-with-your-note-collection-using-ai

Portfolio Website: https://abdul-rafay-three.vercel.app/

1. Introduction

I am Abdul Rafay, a Cyber Security student at FAST NUCES Islamabad, currently in my third year. Even though my degree is in Cyber Security, I have spent the last two years building real, production-ready full-stack applications using TypeScript, React.js, Node.js, and Express. I do not just build toy projects. I build systems that real users can depend on, with proper authentication, error handling, validation, and security built in from the start.

My most relevant project is Memora, an AI-powered educational platform where I built the entire backend myself. In that project I connected OpenAI's API to a Node.js backend, handled user-uploaded files like PDFs and audio, turned them into structured educational content using prompt engineering, parsed and validated the AI responses, and made sure the whole pipeline was robust enough to handle edge cases without crashing. That is almost exactly what this project requires.

I have also built Aurixon, an enterprise multi-tenant SaaS platform with PostgreSQL, complex calculation engines, and auto-generated PDF reports. These projects show that I can handle real architectural decisions, not just follow tutorials.

I already know the Joplin codebase from the pull request I submitted, which means I am not starting from zero when GSoC begins.

2. Project Summary

Many Joplin users have built up huge note collections over years. Some people have thousands of notes clipped from websites, personal writing, research, and ideas. Right now, the only way to find something in those notes is to search for keywords. That works fine for simple lookups but it completely fails when you want to ask a question like "What did I read about machine learning last year?" or "Summarize everything I know about nutrition."

This project fixes that problem by adding a chat interface to Joplin where users can have a real conversation with their note collection. The user types a question, the system finds the most relevant notes, sends them to an AI model along with the question, and gives back a clear, sourced answer. The user can then ask follow-up questions to go deeper.

The final product will be a Joplin plugin with a clean React-based chat UI that works completely locally and respects user privacy. Users will be able to choose their AI provider so they are not locked into one service.

What is out of scope for this project is real-time note syncing during a chat session, support for non-text note content like drawings or attachments, and any cloud storage of user notes or chat history. Keeping the scope realistic means I can deliver something polished and working within 12 weeks instead of something half-finished and buggy.

3. Technical Approach

The system has three main parts: ingesting and indexing the notes, finding the right notes when a user asks a question, and sending those notes to an AI model to generate an answer.

Note Ingestion and Indexing

Joplin exposes a local REST API that plugins can use to read notes. When the plugin starts for the first time, it will read all notes through this API and break them into smaller chunks. This chunking is important because AI models have a limit on how much text they can process at once, and most notes are longer than that limit. Each chunk will be turned into a vector embedding, which is basically a list of numbers that captures the meaning of that text. These embeddings will be stored locally using a library like vectra or hnswlib-node so nothing ever leaves the user's machine unless they choose a cloud AI provider.

I have experience designing data processing pipelines from my work on Memora where I processed user-uploaded PDFs and audio files into structured data. The same architectural thinking applies here.

Finding Relevant Notes

When the user asks a question, the plugin will turn that question into an embedding using the same model and then search the local index for the chunks that are most similar in meaning. This is called semantic search and it is much more powerful than keyword search because it understands intent, not just exact words. The top matching chunks will be collected and used as context for the AI.

Generating the Answer

The relevant note chunks plus the user's question will be sent to an AI model through a carefully designed prompt. I have hands-on experience with prompt engineering from Memora where I had to design prompts that reliably produced structured JSON output from OpenAI. I will apply the same approach here, making sure the AI is instructed to only answer based on the provided notes and to cite which notes it used. The plugin will support at minimum OpenAI and Ollama so users who want full privacy can run everything locally.

The Chat UI

The chat interface will be built as a Joplin plugin panel using React and TypeScript, matching Joplin's existing look and feel. It will show the conversation history, display which notes were used to generate each answer, and allow the user to click through to those notes directly. I built the frontend for both Memora and Aurixon using React and TypeScript so this is familiar ground for me.

Testing and Documentation

Every core function will have unit tests written in Jest. I will also write integration tests that simulate the full flow from a user question to a final answer. All public functions will have JSDoc comments, and I will write a clear developer guide explaining the architecture and a user guide explaining how to install and use the plugin.

4. Implementation Plan

Weeks 1 and 2: Community Bonding and Setup

Get deeply familiar with the Joplin plugin API and codebase. Set up the development environment. Finalize the exact libraries for embeddings and vector storage after benchmarking a few options. Write a short design document and share it with mentors for feedback before writing any real code.

Weeks 3 and 4: Note Ingestion Pipeline

Build the system that reads all notes from Joplin's local API, chunks them into manageable pieces, generates embeddings for each chunk, and stores them in the local vector index. Handle edge cases like empty notes, very long notes, and special characters. Write unit tests for every function in this part.

Weeks 5 and 6: Semantic Search

Build the search function that takes a user question, embeds it, and retrieves the most relevant note chunks from the index. Test it against a realistic note collection to make sure the results are actually relevant. Add a ranking step to filter out low-quality matches.

Weeks 7 and 8: AI Integration and Prompt Engineering

Connect the retrieved note chunks to an AI model using a well-designed prompt. Handle streaming responses so the answer appears word by word instead of all at once, which feels much more natural. Make the AI provider configurable so users can switch between OpenAI and Ollama. Write tests to make sure the integration handles API errors gracefully.

Weeks 9 and 10: Chat UI

Build the React plugin panel with the chat interface. Show conversation history, display source notes for each answer, and add a button to jump to any cited note. Make sure it matches Joplin's theme and works on all platforms Joplin supports. This is the part users will actually see so I will invest real effort into making it clean and easy to use.

Weeks 11 and 12: Testing, Polish, and Documentation

Run the full test suite and fix any bugs. Test on different operating systems. Write the developer documentation and user guide. Address any feedback from mentors. Prepare the final submission.

5. Deliverables

At the end of the project, the following things will exist:

A fully working Joplin plugin that lets users chat with their note collection. The plugin will include local note ingestion and embedding, semantic search over the note index, AI-powered answer generation with source citations, a clean React chat UI panel, support for both OpenAI and Ollama, a complete test suite with unit and integration tests, a developer guide explaining the architecture, and a user guide explaining how to install and use the plugin.

6. Availability

I am in my sixth semester at FAST NUCES. My academic schedule takes roughly 20 to 25 hours per week including classes and coursework. That leaves me comfortably able to commit 25 to 30 hours per week to GSoC, which is more than enough for the 350-hour project across 12 weeks.

I am in the Pakistan Standard Time zone which is UTC plus 5. I am flexible about meeting times and can adjust for mentor availability in other time zones. I have no planned travel or vacations during the GSoC coding period. I will give this project my full focus during the summer.

I am reachable daily through the Joplin forum, GitHub, and email, and I commit to responding to mentor feedback within 24 hours and pushing code updates at least three times per week so progress is always visible.