AI Supported Search For Notes
19.03.2026
─
Muhammad Zohaib Irshad
Full Stack TypeScript Developer
Pakistan
Air University Islamabad
1. Introduction
I am Muhammad Zohaib Irshad, a mid senior full stack software engineer based in Islamabad, Pakistan. I am currently completing my Bachelor's degree in Software Engineering at Air University Islamabad.
Contact Information:
Name: Muhammad Zohaib Irshad
Email: zohaibirshad678@gmail.com
Github Profile URL: developerzohaib786 (Muhammad Zohaib Irshad) · GitHub
LinkedIn Profile URL: https://linkedin.com/in/developerzohaib
Address: Islamabad, Pakistan, Asia.
Education:
University: Air University Islamabad
Degree: Bachelors in Software Engineering
Programming Experience
| Area | Technologies |
|---|---|
| Frontend | JavaScript, TypeScript, ReactJS, Nextjs, HTML5, CSS3, Tailwindcss, ShadcnUI |
| Backend | NodeJs,ExpressJS, NestJS, Prisma, JWT, Socket.io |
| Generative AI | RAG System, LangchainJS, Qdrant Vector DB, Cohere API, Gemini API, Reddit API |
| DataBases | Vector Database (Qdrant), MongoDB, PostgreSQL, NeonDB, MySQL |
| System Design | Redis, Kafka, BullMQ, Rate Limiting, Cache, Server Clustering |
I am working in the tech Industry from the 2023 and have almost 3 years of experience and I contributed to other Organisations also that I have mentioned in the Pull Requests and Relevant Work Section.
Tech Industry Experience:
| Company | Role | Timeline |
|---|---|---|
| SyncaAI | Full Stack TypeScript Intern | Jul 2025 - Sept 2025 |
| Softechnova Enterprises | MERN Stack Intern | Jun 2025 - Jul 2025 |
| SARTE Digital Marketing | SEO Expert and WordPress Content Writer | Oct 2023 - Sept 2024 |
Open source experience
I actively contribute to open source projects including Apache polaris-tools repository, Apache doris repository, FOSSASIA Org, and Links-Hub repository. I understand how to navigate large, established codebases, communicate through PRs, and follow project contribution standards.
I have mentioned my Open source work in the Pull Requests and Relevant Work Section.
2. Project Summary
Problem
As a Joplin user's note collection grows, keyword search becomes unreliable. Users often remember the context or topic of a note but not the exact words it contains, making the existing search engine insufficient for large knowledge bases.
How I Will Implement It
The plugin will index all notes into a local sqlite-vec vector database at startup, with incremental updates when notes change. When a user types a natural-language query in the search box, the plugin will embed the query using an LLM embedding model and perform a similarity search against the indexed notes, returning the most semantically relevant results.
The user's own LLM API key will be configured in the plugin settings. The AI search will sit alongside Joplin's existing keyword search. Natural language queries will trigger the AI engine while standard search syntax continues to use the existing engine, sharing the same search box.
Expected Outcome
A Joplin plugin that provides semantic, natural language search over the user's note collection. Users can describe what they are looking for in plain English and receive accurate, contextually relevant results even when they cannot recall specific keywords. The AI search supplements rather than replaces the existing search engine.
3. Technical Approach
3.1 Why a Plugin and Not a Core Feature
Implementing this as a plugin rather than modifying Joplin's core is the right approach for several reasons.
First, it keeps the core application lightweight not every user needs or wants AI powered search and bundling it into core would force everyone to carry that dependency.
Second, a plugin can be iterated on and updated independently of Joplin's main release cycle, meaning improvements can ship faster.
Third, it avoids the complexity of modifying core search infrastructure which would require a much deeper understanding of the entire codebase and carries a higher risk of introducing regressions. A plugin is self contained, reversible, and easier for mentors to review and merge.
3.2 How This Differs from Existing Plugins
According to my knowledge, no existing Joplin plugin implements full semantic vector search over the entire note collection with a local file-based vector store.
Some existing AI plugins provide note summarisation or simple LLM prompting but they do not index the collection into embeddings or perform similarity based retrieval.
The key difference in this proposal is that every note is pre-processed into a vector representation at index time, which means search at query time is nearly instant regardless of how many notes the user has the system does not send all notes to the LLM on every search. It only sends the most relevant ones it already found via vector similarity.
3.3 Technology Stack of the Plugin
The plugin will be written entirely in TypeScript, consistent with Joplin's plugin ecosystem. For the vector database, sqlite-vec will serve as the default storage layer. It is a SQLite extension that requires zero external setup from the user. It stores data in a single file alongside the Joplin database. It works on all platforms Joplin supports.
For the embedding model, the plugin will call the user's configured LLM provider (OpenAI by default) to convert note chunks and search queries into vector embeddings.
For local and offline support, Ollama will be offered as an alternative provider, allowing users who do not want to share data with a cloud API to run everything on their own machine. The plugin UI will be built in React using the Joplin panel API.
| Component | Technology | Details |
|---|---|---|
| Language | TypeScript | Consistent with Joplin's entire plugin ecosystem |
| Vector Database | sqlite-vec | SQLite extension. Zero setup, single file, works on all platforms Joplin supports |
| Default Embedding Provider | OpenAI | Converts note chunks and search queries into vector embeddings |
| Local / Offline Provider | Ollama | Fully offline alternative. No data sent to any cloud API |
| Plugin UI | React | Built using the official Joplin panel API |
3.4 Architecture
The architecture works in two distinct phases that are worth understanding separately because they happen at different times.
3.4.1 Indexing Phase:
The first phase is indexing, which happens once at setup and then incrementally whenever notes change. The plugin fetches all notes through the Joplin Data API, splits each note's content into overlapping text chunks (for example, 500 tokens per chunk with a 100 token overlap to preserve context across chunk boundaries) and it sends each chunk to the embedding model to get a vector essentially a list of numbers that captures the semantic meaning of that text. These vectors are stored in sqlite-vec alongside metadata such as the note ID, title, and chunk position. A SHA-256 hash of each note's content is stored so the plugin can detect which notes have changed and only re-index those avoiding unnecessary API calls.
The vectors and metadata are stored in a single SQLite table:
Why F32BLOB and not JSON?
The vector is stored as binary because sqlite-vec performs cosine similarity calculations directly on the raw binary data without any parsing step. On a collection of 10,000 notes this difference in speed is significant.
| Stats | F32BLOB (binary) | JSON array |
|---|---|---|
| Speed | Very fast. Cosine similarity runs directly in DB | Slow and must be parsed before every comparison |
| Human readable | No | Yes |
| Used for | Storing vectors in sqlite-vec | Not suitable for vector search |
3.4.2 Querying Phase
The second phase is querying, which happens every time the user submits a natural language search. The plugin provides its own dedicated search box inside the Joplin sidebar panel. It is separated from Joplin's existing keyword search.
This keeps the two search systems independent and avoids any conflict with users who rely on Joplin's existing search syntax such as notebook: or tag: operators.
When the user types a query into the AI search box and submits it, the query text is embedded using the same model that was used during indexing.
This is important because both the query and the stored chunks need to live in the same vector space for similarity comparison to work. The resulting query vector is compared against all stored note vectors in sqlite-vec using cosine similarity. The top results are returned ranked by relevance. These results are then displayed directly inside the sidebar panel showing the note title, a relevant excerpt, and a link to open the note directly in Joplin.
3.5 System Design Principles
I will be applying following System Design Rules:
- Incremental Processing via Change Detection SHA-256 hashing ensures only modified notes are re-indexed. The system never re-processes the entire collection when only one note changed, saving both time and API cost.
- Chunking with Overlap for Context Preservation Notes are split into overlapping chunks rather than stored as single vectors. The overlap ensures that meaning at chunk boundaries is never lost, which directly improves retrieval accuracy.
- Encrypted API Key Storage The user's API key is encrypted at rest using Joplin's secure storage. It is only decrypted in memory at the exact moment an API call is made and is never logged, cached, or exposed anywhere else in the plugin.
- Provider Abstraction via Common Interface The LLM provider sits behind a shared interface. Switching between OpenAI and Ollama is a settings change only. No code changes, no re indexing, and no disruption to either pipeline.
- Vector Space Consistency is the same embedding model must be used for both indexing and querying. If the query and stored chunks were embedded by different models they would live in different vector spaces and cosine similarity comparison would produce meaningless results.
- Background Processing Indexing runs in a Worker thread, keeping the main UI thread free. The user can continue using Joplin normally while thousands of notes are being indexed in the background.
- Separation of Concerns Indexing and querying are built as two completely independent subsystems. Each does one job and has no knowledge of the other's internal logic.
- Graceful Degradation If the API is unreachable or the key is invalid, the plugin does not crash. It falls back silently to Joplin's existing keyword search and shows the user a clear message explaining what went wrong.
4. Implementation Plan
The project is 350 hours across the GSoC 2026 coding period, May 26 to August 23. Development follows a deliberate sequence: scaffolding first, then indexing, then querying, then UI and polish. I will ensure a working prototype exists by midterm. The community bonding period from May 1 to May 25 will be used to study the Joplin plugin API and validate critical technical decisions with mentors before coding begins.
Week 1–2: June 2 to June 15 · Plugin Scaffold & Settings
| Task | Type |
|---|---|
| Bootstrap plugin project with TypeScript and Webpack | Required |
| Implement settings panel: API key input, model selector, Ollama URL field | Required |
| Register the dedicated AI search sidebar panel and confirm it renders in Joplin desktop | Required |
| Set up Jest with a basic test runner for the project | Required |
| Write unit tests for settings persistence | Required |
Week 3–4: June 16 to June 29 · Note Ingestion & SHA-256 Change Detection
| Task | Type |
|---|---|
| Implement paginated note fetcher using Joplin Data API | Required |
| Store SHA-256 hash per note to detect changed content | Required |
| Build the incremental indexing logic: skip unchanged, re-process modified | Required |
| Unit tests for the change detection module with varied note states | Required |
Week 5–6: June 30 to July 13 · Chunking & Embedding
| Task | Required |
|---|---|
| Build Markdown aware chunker: 500 token chunks, 100 token overlap | Required |
| Integrate OpenAI text-embedding-3-small via provider interface | Required |
| Implement Ollama embedding adapter behind the same interface | Required |
| Set up sqlite-vec as the embedded vector store with F32BLOB(1536) schema | Required |
| Run indexing in a background Worker thread with progress reporting to the panel | Required |
| Integration tests for end to end indexing flow on a sample note collection | Required |
Midterm Evaluation: July 14 to July 18
Expected deliverable: A working indexing pipeline that fetches, chunks, embeds, and stores notes in sqlite-vec, with a basic sidebar panel that accepts a query and returns raw matched chunks without a fully polished UI.
Week 7–8 — July 15 to July 28 · Query Pipeline
| Task | Type |
|---|---|
| Build query embedder using the same provider interface as indexing | Required |
| Implement cosine similarity search against sqlite-vec | Required |
| Build top K ranking and result extraction with note title and excerpt | Required |
| Ollama query support using local model | Required |
Week 9–10: July 29 to August 11 · Search UI & Results Panel
| Task | Type |
|---|---|
| Build the dedicated AI search box in the React sidebar panel | Required |
| Display results: note title, relevant excerpt, link to open note in Joplin | Required |
| Add loading indicator while query is being processed | Required |
| Graceful fallback: show clear error message if API is unreachable | Required |
| Encrypted API key storage using Joplin secure storage | Required |
| Keyboard shortcut to focus the AI search box | Required |
Week 11–12: August 12 to August 22 · Performance, Edge Cases & Testing
| Task | Type |
|---|---|
| Benchmark indexing on a collection of 10,000+ notes | Required |
| Handle edge cases: empty notes, very short notes, non-English content | Required |
| API rate limiting between embedding calls to avoid hitting provider limits | Required |
| ARIA attributes on the search panel for screen-reader accessibility | Required |
| End to end integration test on a real Joplin database | Required |
| Support filtering results by notebook or tag | Optional |
Final Phase: August 23 to September 1
| Task | Type |
|---|---|
| Write user-facing documentation: setup guide, privacy FAQ, configuration reference | Required |
| Write architecture documentation for future contributors | Required |
| Complete unit and integration test coverage | Required |
| Record a short demo video showing the plugin in action | Required |
| Final code review with mentors Laurent and Marph; address all feedback | Required |
| Submit plugin to the Joplin plugin marketplace | Required |
5. Deliverables
At the end of the GSoC period the following will exist as working, tested, and documented outputs. Deliverables are grouped by subsystem so it is clear what each part of the project produces. Required items represent the minimum successful outcome of the project. Optional items will be completed if time permits after all required deliverables are finished.
Core Plugin
| Deliverable | Description |
|---|---|
| Joplin plugin package | Installable .jpl plugin published to the Joplin plugin marketplace |
| Plugin settings panel | API key input, model selector, Ollama URL field with encrypted key storage |
| Dedicated AI search sidebar | React-based sidebar panel with its own search box, separate from Joplin's existing search |
6. Availability
I am fully available for the entire GSoC 2026 coding period with no competing employment, internship, or academic commitments. I treat GSoC as a full time engagement. If I encounter a blocker I will raise it on the forum the same day rather than waiting. I will maintain a public weekly progress post so mentors and the community can track progress and give feedback at every stage of the project.
| Item | Details |
|---|---|
| Weekly availability | 40–45 hours per week during the coding period |
| Time zone | PKT UTC+5 (Islamabad, Pakistan) |
| Mentor overlap | Morning PKT overlaps with European business hours, allowing daily async communication with mentors Laurent and Marph |
| Communication style | Daily check-in on Joplin forum; code pushed or a PR opened every 2–3 days; brief public weekly progress update posted to the forum |
| Other commitments | No other employment or internship is planned during the GSoC period. University summer schedule is free of coursework obligations |
| Known absences | None currently planned. Any unavoidable absence will be communicated to mentors at least one week in advance |
I commit to treating GSoC as a full-time engagement for its entire duration. If I encounter a blocker I will raise it on the forum the same day rather than waiting. I will maintain a public weekly progress post so mentors and the community can track progress, give feedback, and stay informed at every stage of the project.
7. Pull Requests & Relevant Work
Contributions made in Joplin:
| Contribution | Status | Links |
|---|---|---|
| Joplin Word Count, Spell Check & Reading Metrics | Completed | [Github Repository] [Npm Package] [Vedio Demonstration] |
| AI Note Assistant - Chat On Your Notes | Completed | [Github Repository] [Npm Package] [Vedio Demonstration] |
| Pull Request | Awaiting Review | [[Pull Request] |
| Pull Request | Closed | [Pull Request] |
Contributions in Apache and Fossasia Organisations:
| Contribution | Status | Links |
|---|---|---|
| Pull Request (Complex Backend Issue Solved) | Merged | [Pull Request] |
| Pull Request | Merged | [Pull Request] |
| Pull Request | Awaiting Review | [Pull Request] |
| Pull Request | Awaiting Review | [Pull Request] |
| Pull Request | Awaiting Review | [Pull Request] |
My Personal Project Releated to this Plugin:
| Project | Github Links | Tech Stack |
|---|---|---|
| RAG Pipeline | [Frontend Repo] [Backend Repo] | [Live Link] |
| AI+Reddit Analysis Based Ecommerce Store | [Github Repo] [Live Link] | TypeScript, Nodejs, NextJS, Reddit API, Gemini API, MongoDB database, NextAuth etc |
| LeaderBoard Sphere | [Frontend Repo] [Backend Repo] | NextJS, NodeJS, Redis, Kafka, Prisma, SocketIO etc |




