GSoC 2026 Proposal Draft – Idea # 1: AI Supported Search For Notes – Muhammad Zohaib Irshad

AI Supported Search For Notes

19.03.2026

Muhammad Zohaib Irshad

Full Stack TypeScript Developer

Pakistan

Air University Islamabad

1. Introduction

I am Muhammad Zohaib Irshad, a mid senior full stack software engineer based in Islamabad, Pakistan. I am currently completing my Bachelor's degree in Software Engineering at Air University Islamabad.

Contact Information:

Name: Muhammad Zohaib Irshad

Email: zohaibirshad678@gmail.com

Github Profile URL: developerzohaib786 (Muhammad Zohaib Irshad) · GitHub

LinkedIn Profile URL: https://linkedin.com/in/developerzohaib

Address: Islamabad, Pakistan, Asia.

Education:

University: Air University Islamabad

Degree: Bachelors in Software Engineering

Programming Experience

Area Technologies
Frontend JavaScript, TypeScript, ReactJS, Nextjs, HTML5, CSS3, Tailwindcss, ShadcnUI
Backend NodeJs,ExpressJS, NestJS, Prisma, JWT, Socket.io
Generative AI RAG System, LangchainJS, Qdrant Vector DB, Cohere API, Gemini API, Reddit API
DataBases Vector Database (Qdrant), MongoDB, PostgreSQL, NeonDB, MySQL
System Design Redis, Kafka, BullMQ, Rate Limiting, Cache, Server Clustering

I am working in the tech Industry from the 2023 and have almost 3 years of experience and I contributed to other Organisations also that I have mentioned in the Pull Requests and Relevant Work Section.

Tech Industry Experience:

Company Role Timeline
SyncaAI Full Stack TypeScript Intern Jul 2025 - Sept 2025
Softechnova Enterprises MERN Stack Intern Jun 2025 - Jul 2025
SARTE Digital Marketing SEO Expert and WordPress Content Writer Oct 2023 - Sept 2024

Open source experience

I actively contribute to open source projects including Apache polaris-tools repository, Apache doris repository, FOSSASIA Org, and Links-Hub repository. I understand how to navigate large, established codebases, communicate through PRs, and follow project contribution standards.

I have mentioned my Open source work in the Pull Requests and Relevant Work Section.

2. Project Summary

Problem

As a Joplin user's note collection grows, keyword search becomes unreliable. Users often remember the context or topic of a note but not the exact words it contains, making the existing search engine insufficient for large knowledge bases.

How I Will Implement It

The plugin will index all notes into a local sqlite-vec vector database at startup, with incremental updates when notes change. When a user types a natural-language query in the search box, the plugin will embed the query using an LLM embedding model and perform a similarity search against the indexed notes, returning the most semantically relevant results.

The user's own LLM API key will be configured in the plugin settings. The AI search will sit alongside Joplin's existing keyword search. Natural language queries will trigger the AI engine while standard search syntax continues to use the existing engine, sharing the same search box.

Expected Outcome

A Joplin plugin that provides semantic, natural language search over the user's note collection. Users can describe what they are looking for in plain English and receive accurate, contextually relevant results even when they cannot recall specific keywords. The AI search supplements rather than replaces the existing search engine.

3. Technical Approach

3.1 Why a Plugin and Not a Core Feature

Implementing this as a plugin rather than modifying Joplin's core is the right approach for several reasons.

First, it keeps the core application lightweight not every user needs or wants AI powered search and bundling it into core would force everyone to carry that dependency.

Second, a plugin can be iterated on and updated independently of Joplin's main release cycle, meaning improvements can ship faster.

Third, it avoids the complexity of modifying core search infrastructure which would require a much deeper understanding of the entire codebase and carries a higher risk of introducing regressions. A plugin is self contained, reversible, and easier for mentors to review and merge.

3.2 How This Differs from Existing Plugins

According to my knowledge, no existing Joplin plugin implements full semantic vector search over the entire note collection with a local file-based vector store.

Some existing AI plugins provide note summarisation or simple LLM prompting but they do not index the collection into embeddings or perform similarity based retrieval.

The key difference in this proposal is that every note is pre-processed into a vector representation at index time, which means search at query time is nearly instant regardless of how many notes the user has the system does not send all notes to the LLM on every search. It only sends the most relevant ones it already found via vector similarity.

3.3 Technology Stack of the Plugin

The plugin will be written entirely in TypeScript, consistent with Joplin's plugin ecosystem. For the vector database, sqlite-vec will serve as the default storage layer. It is a SQLite extension that requires zero external setup from the user. It stores data in a single file alongside the Joplin database. It works on all platforms Joplin supports.

For the embedding model, the plugin will call the user's configured LLM provider (OpenAI by default) to convert note chunks and search queries into vector embeddings.

For local and offline support, Ollama will be offered as an alternative provider, allowing users who do not want to share data with a cloud API to run everything on their own machine. The plugin UI will be built in React using the Joplin panel API.

Component Technology Details
Language TypeScript Consistent with Joplin's entire plugin ecosystem
Vector Database sqlite-vec SQLite extension. Zero setup, single file, works on all platforms Joplin supports
Default Embedding Provider OpenAI Converts note chunks and search queries into vector embeddings
Local / Offline Provider Ollama Fully offline alternative. No data sent to any cloud API
Plugin UI React Built using the official Joplin panel API

3.4 Architecture

The architecture works in two distinct phases that are worth understanding separately because they happen at different times.

3.4.1 Indexing Phase:

The first phase is indexing, which happens once at setup and then incrementally whenever notes change. The plugin fetches all notes through the Joplin Data API, splits each note's content into overlapping text chunks (for example, 500 tokens per chunk with a 100 token overlap to preserve context across chunk boundaries) and it sends each chunk to the embedding model to get a vector essentially a list of numbers that captures the semantic meaning of that text. These vectors are stored in sqlite-vec alongside metadata such as the note ID, title, and chunk position. A SHA-256 hash of each note's content is stored so the plugin can detect which notes have changed and only re-index those avoiding unnecessary API calls.

The vectors and metadata are stored in a single SQLite table:

Why F32BLOB and not JSON?

The vector is stored as binary because sqlite-vec performs cosine similarity calculations directly on the raw binary data without any parsing step. On a collection of 10,000 notes this difference in speed is significant.

Stats F32BLOB (binary) JSON array
Speed Very fast. Cosine similarity runs directly in DB Slow and must be parsed before every comparison
Human readable No Yes
Used for Storing vectors in sqlite-vec Not suitable for vector search

3.4.2 Querying Phase

The second phase is querying, which happens every time the user submits a natural language search. The plugin provides its own dedicated search box inside the Joplin sidebar panel. It is separated from Joplin's existing keyword search.

This keeps the two search systems independent and avoids any conflict with users who rely on Joplin's existing search syntax such as notebook: or tag: operators.

When the user types a query into the AI search box and submits it, the query text is embedded using the same model that was used during indexing.

This is important because both the query and the stored chunks need to live in the same vector space for similarity comparison to work. The resulting query vector is compared against all stored note vectors in sqlite-vec using cosine similarity. The top results are returned ranked by relevance. These results are then displayed directly inside the sidebar panel showing the note title, a relevant excerpt, and a link to open the note directly in Joplin.

3.5 System Design Principles

I will be applying following System Design Rules:

  • Incremental Processing via Change Detection SHA-256 hashing ensures only modified notes are re-indexed. The system never re-processes the entire collection when only one note changed, saving both time and API cost.
  • Chunking with Overlap for Context Preservation Notes are split into overlapping chunks rather than stored as single vectors. The overlap ensures that meaning at chunk boundaries is never lost, which directly improves retrieval accuracy.
  • Encrypted API Key Storage The user's API key is encrypted at rest using Joplin's secure storage. It is only decrypted in memory at the exact moment an API call is made and is never logged, cached, or exposed anywhere else in the plugin.
  • Provider Abstraction via Common Interface The LLM provider sits behind a shared interface. Switching between OpenAI and Ollama is a settings change only. No code changes, no re indexing, and no disruption to either pipeline.
  • Vector Space Consistency is the same embedding model must be used for both indexing and querying. If the query and stored chunks were embedded by different models they would live in different vector spaces and cosine similarity comparison would produce meaningless results.
  • Background Processing Indexing runs in a Worker thread, keeping the main UI thread free. The user can continue using Joplin normally while thousands of notes are being indexed in the background.
  • Separation of Concerns Indexing and querying are built as two completely independent subsystems. Each does one job and has no knowledge of the other's internal logic.
  • Graceful Degradation If the API is unreachable or the key is invalid, the plugin does not crash. It falls back silently to Joplin's existing keyword search and shows the user a clear message explaining what went wrong.

4. Implementation Plan

The project is 350 hours across the GSoC 2026 coding period, May 26 to August 23. Development follows a deliberate sequence: scaffolding first, then indexing, then querying, then UI and polish. I will ensure a working prototype exists by midterm. The community bonding period from May 1 to May 25 will be used to study the Joplin plugin API and validate critical technical decisions with mentors before coding begins.

Week 1–2: June 2 to June 15 · Plugin Scaffold & Settings

Task Type
Bootstrap plugin project with TypeScript and Webpack Required
Implement settings panel: API key input, model selector, Ollama URL field Required
Register the dedicated AI search sidebar panel and confirm it renders in Joplin desktop Required
Set up Jest with a basic test runner for the project Required
Write unit tests for settings persistence Required

Week 3–4: June 16 to June 29 · Note Ingestion & SHA-256 Change Detection

Task Type
Implement paginated note fetcher using Joplin Data API Required
Store SHA-256 hash per note to detect changed content Required
Build the incremental indexing logic: skip unchanged, re-process modified Required
Unit tests for the change detection module with varied note states Required

Week 5–6: June 30 to July 13 · Chunking & Embedding

Task Required
Build Markdown aware chunker: 500 token chunks, 100 token overlap Required
Integrate OpenAI text-embedding-3-small via provider interface Required
Implement Ollama embedding adapter behind the same interface Required
Set up sqlite-vec as the embedded vector store with F32BLOB(1536) schema Required
Run indexing in a background Worker thread with progress reporting to the panel Required
Integration tests for end to end indexing flow on a sample note collection Required

Midterm Evaluation: July 14 to July 18

Expected deliverable: A working indexing pipeline that fetches, chunks, embeds, and stores notes in sqlite-vec, with a basic sidebar panel that accepts a query and returns raw matched chunks without a fully polished UI.

Week 7–8 — July 15 to July 28 · Query Pipeline

Task Type
Build query embedder using the same provider interface as indexing Required
Implement cosine similarity search against sqlite-vec Required
Build top K ranking and result extraction with note title and excerpt Required
Ollama query support using local model Required

Week 9–10: July 29 to August 11 · Search UI & Results Panel

Task Type
Build the dedicated AI search box in the React sidebar panel Required
Display results: note title, relevant excerpt, link to open note in Joplin Required
Add loading indicator while query is being processed Required
Graceful fallback: show clear error message if API is unreachable Required
Encrypted API key storage using Joplin secure storage Required
Keyboard shortcut to focus the AI search box Required

Week 11–12: August 12 to August 22 · Performance, Edge Cases & Testing

Task Type
Benchmark indexing on a collection of 10,000+ notes Required
Handle edge cases: empty notes, very short notes, non-English content Required
API rate limiting between embedding calls to avoid hitting provider limits Required
ARIA attributes on the search panel for screen-reader accessibility Required
End to end integration test on a real Joplin database Required
Support filtering results by notebook or tag Optional

Final Phase: August 23 to September 1

Task Type
Write user-facing documentation: setup guide, privacy FAQ, configuration reference Required
Write architecture documentation for future contributors Required
Complete unit and integration test coverage Required
Record a short demo video showing the plugin in action Required
Final code review with mentors Laurent and Marph; address all feedback Required
Submit plugin to the Joplin plugin marketplace Required

5. Deliverables

At the end of the GSoC period the following will exist as working, tested, and documented outputs. Deliverables are grouped by subsystem so it is clear what each part of the project produces. Required items represent the minimum successful outcome of the project. Optional items will be completed if time permits after all required deliverables are finished.

Core Plugin

Deliverable Description
Joplin plugin package Installable .jpl plugin published to the Joplin plugin marketplace
Plugin settings panel API key input, model selector, Ollama URL field with encrypted key storage
Dedicated AI search sidebar React-based sidebar panel with its own search box, separate from Joplin's existing search

6. Availability

I am fully available for the entire GSoC 2026 coding period with no competing employment, internship, or academic commitments. I treat GSoC as a full time engagement. If I encounter a blocker I will raise it on the forum the same day rather than waiting. I will maintain a public weekly progress post so mentors and the community can track progress and give feedback at every stage of the project.

Item Details
Weekly availability 40–45 hours per week during the coding period
Time zone PKT UTC+5 (Islamabad, Pakistan)
Mentor overlap Morning PKT overlaps with European business hours, allowing daily async communication with mentors Laurent and Marph
Communication style Daily check-in on Joplin forum; code pushed or a PR opened every 2–3 days; brief public weekly progress update posted to the forum
Other commitments No other employment or internship is planned during the GSoC period. University summer schedule is free of coursework obligations
Known absences None currently planned. Any unavoidable absence will be communicated to mentors at least one week in advance

I commit to treating GSoC as a full-time engagement for its entire duration. If I encounter a blocker I will raise it on the forum the same day rather than waiting. I will maintain a public weekly progress post so mentors and the community can track progress, give feedback, and stay informed at every stage of the project.

7. Pull Requests & Relevant Work

Contributions made in Joplin:

Contribution Status Links
Joplin Word Count, Spell Check & Reading Metrics Completed [Github Repository] [Npm Package] [Vedio Demonstration]
AI Note Assistant - Chat On Your Notes Completed [Github Repository] [Npm Package] [Vedio Demonstration]
Pull Request Awaiting Review [[Pull Request]
Pull Request Closed [Pull Request]

Contributions in Apache and Fossasia Organisations:

Contribution Status Links
Pull Request (Complex Backend Issue Solved) Merged [Pull Request]
Pull Request Merged [Pull Request]
Pull Request Awaiting Review [Pull Request]
Pull Request Awaiting Review [Pull Request]
Pull Request Awaiting Review [Pull Request]

My Personal Project Releated to this Plugin:

Project Github Links Tech Stack
RAG Pipeline [Frontend Repo] [Backend Repo] [Live Link]
AI+Reddit Analysis Based Ecommerce Store [Github Repo] [Live Link] TypeScript, Nodejs, NextJS, Reddit API, Gemini API, MongoDB database, NextAuth etc
LeaderBoard Sphere [Frontend Repo] [Backend Repo] NextJS, NodeJS, Redis, Kafka, Prisma, SocketIO etc

Thanks for the proposal. It would be useful to focus more on Joplin-specific constraints, especially around indexing large collections, API cost, and how the plugin behaves on first run. Clarifying these practical aspects would strengthen the proposal.

1 Like