Links:
GitHub profile: justin212407 (Justin Charles) · GitHub
Contributions to Joplin:
- #14547 :- Fix ++insert++ syntax rendering fix in markdown (Merged)
- #14563:- Fix Prevent unclosed frontmatter from breaking Markdown rendering (Merged)
- #14626:- add plugin website link to /help (Merged)
- #14634:- Application crashes when deleting a notebook (Closed but considered for GSoC)
- #14674:- hide new note/todo buttons when no notebook exists (Merged)
- #14692:- Always require password confirmation when changing master password after encryption (Merged)
- #14824:- Add warning dialog before JEX import to prevent note duplication (Approved)
1. Introduction
I am Justin Charles, a B.Tech Information Science Engineering student with a strong interest in full-stack systems and distributed architectures. I am a year-long contributor at Sugar Labs and an C4GT DMP 2025 contributor, where I have worked on the Music Blocks v4 project. Through this, I have gained experience collaborating on large open source codebases and understanding project architecture,.
2. Project Summary
Joplin users who maintain large note collections over months and years face a compounding organizational problem: tags become inconsistent, notes land in the wrong notebooks, and hundreds of old notes silently accumulate without ever being revisited. No current Joplin feature addresses this systematically.
This project will build a privacy-first AI categorization plugin that analyses a user's entire note collection, discovers semantic groupings, and proposes concrete organizational actions creating tags, filing notes into notebooks, and surfacing archive candidates which the user reviews and approves before anything changes. The system is designed around a strict suggest -> confirm -> apply contract: the AI never modifies a note without explicit user permission.
The implementation follows a plugin-first architecture, leveraging joplin.data for all read/write operations against the Joplin internal REST layer (packages/lib/services/rest/routes/). The AI pipeline defaults to a fully local Ollama backend for privacy, with a transformers.js fallback requiring no external dependencies, and an optional opt-in path for users who wish to use frontier model APIs. A phased timeline ensures a working MVP by Week 6, with the remaining weeks dedicated to polish, edge cases, and documentation.
3. Technical Approach and Implementation
Deliverable 1 - Core Plugin Infrastructure & Incremental Indexing Pipeline
What it is
The foundational layer of the entire system. This deliverable establishes the plugin scaffold, the paginated note fetching mechanism, the hash-based incremental index, and the persistence layer that stores per-note embedding metadata between sessions. Nothing else in the project works without this.
Approach in depth
Plugin scaffold is generated using yo generator-joplin and structured with three top-level modules: indexer, analyser, and ui. The plugin registers a toolbar command and a panel on onStart.
Paginated note fetching uses joplin.data.get with limit: 100 and page increments until has_more is false. Fields fetched are id, title, body, parent_id, updated_time. This mirrors the internal REST route at packages/lib/services/rest/routes/notes.ts.
Hash-based incremental indexing avoids re-embedding notes that haven't changed. On each run, the plugin computes an MD5 hash of each note's body and compares it against a stored hash. Only changed or new notes are sent to the embedding backend. This keeps the plugin fast and non-intrusive on large vaults.
Persistence layer stores all index metadata in Joplin plugin settings using namespaced keys (index.hash.<noteId>, index.embedding.<noteId>, index.clusterId.<noteId>). This avoids touching note userData which would trigger unnecessary sync events.
Trigger mechanism hooks into joplin.workspace.onSyncComplete so the index refreshes automatically after every sync, and exposes a manual "Analyse Notes" command for on-demand runs.
Deliverable 2 - AI Embedding & Analysis Engine
What it is
The intelligence core of the plugin. This deliverable implements the three-tier embedding backend, the HDBSCAN clustering algorithm, LLM-based cluster label generation, and the centroid classifier for mapping notes to existing notebooks. This is where the raw note text becomes actionable organizational suggestions.
Approach in depth
Three-tier embedding backend is selected via a plugin settings dropdown. All three tiers produce the same output - a float32[] vector - so the rest of the pipeline is backend-agnostic.
Confidence thresholds control what gets surfaced:
similarity ≥ 0.85-> high confidence, shown first in UI0.6 ≤ similarity < 0.85-> medium confidence, shown with a warning badgesimilarity < 0.6-> noise point, moved to Uncategorized holding area
Deliverable 3 - Intelligent Categorisation System (Tagging + Notebook Filing)
What it is
The layer that translates AI analysis output into concrete, validated Joplin actions. This deliverable implements the Command Dispatcher, the full LLM-to-API mapping, the duplicate-prevention logic, and the rollback mechanism that makes every action reversible within a session.
Approach in depth
Command Dispatcher is the critical bridge between LLM output and Joplin's API. The LLM is prompted to return a strictly typed JSON array. The dispatcher validates each command against a TypeScript schema before execution and refuses to run any command that fails validation. Duplicate prevention checks existing tags and notebooks before creating new ones.
This prevents the plugin from creating machine-learning, Machine Learning, and machine learning as three separate tags.
Snapshot-based rollback captures the current state of every note that will be affected before dispatching any commands. If the user clicks "Undo" within the same session, the dispatcher replays the snapshot in reverse.
Agentic mode (stretch goal, opt-in) allows users with API keys to schedule automatic categorization runs. In agentic mode, high-confidence suggestions (≥ 0.92) are applied without UI review, and a summary notification is shown afterward. This is disabled by default and requires explicit user opt-in in settings.
Deliverable 4 - Review & Apply UI (React Panel)
What it is
The user-facing surface of the entire system. A React-based sidebar panel built with joplin.views.panels.create() that presents all AI suggestions grouped by type, lets the user approve or reject them individually or in bulk, and triggers the Command Dispatcher on confirmation.
Approach in depth
Panel architecture uses Joplin's panel API with joplin.views.panels.onMessage for two-way communication between the React webview and the plugin main process. The panel posts messages to the plugin, which executes joplin.data calls and responds with results.
Three-tab layout:
- Tags tab — groups suggestions by proposed tag. Shows note count per tag, a confidence badge, and an expandable list of affected notes. Users can approve all notes for a tag, or deselect individual notes before applying.
- Notebooks tab — shows proposed note moves with a before → after path display (
Work > General→Work > Machine Learning). Users approve or reject per note. - Archive tab — shows cold notes sorted by last activity, with
last_viewedandupdated_timedisplayed. Users can select notes to archive in bulk.
State management uses React useReducer with an actions: Command[] array and a selected: Set<string> for tracking which suggestions are approved. The "Apply Selected" button is disabled until at least one suggestion is selected.
"Never suggest again" rule — users can right-click any suggestion and mark it as permanently ignored. The plugin stores ignored note-tag pairs in settings and filters them from all future runs.
Deliverable 5 - Archive Discovery, Settings, and Documentation
What it is
The final production-ready layer. This deliverable implements the cold note detection system, the complete settings UI, performance optimization for large vaults (2000+ notes), and full technical documentation.
Approach in depth
Cold note detection requires solving a gap in Joplin's data model: there is no native last_viewed_time field. The plugin implements a lightweight tracking mechanism using joplin.workspace.onNoteSelectionChange. This runs silently in the background from the moment the plugin is installed, building up a last_viewed record per note stored entirely in plugin settings — not in note userData, which avoids triggering spurious sync events.
Archive criteria — all three conditions must be true:
Date.now() - lastViewed[noteId] > userThreshold(default: 180 days)note.updated_time < userThreshold(default: 90 days)- Note is not in a notebook the user has marked as "exclude from archive suggestions"
Privacy settings UI presents three clearly labelled backend options with a data-handling disclosure:
| Setting | Default | Description |
|---|---|---|
| AI Backend | Ollama (local) | Where embeddings are computed |
| External API Key | Empty / disabled | Only visible if Tier 3 selected |
| Archive threshold | 180 days | How long before a note is considered cold |
| Exclude notebooks | None | Notebooks never suggested for archive |
| Agentic mode | Off | Auto-apply high-confidence suggestions |
A permanent notice in the settings panel reads: "Your note content never leaves your device unless you choose an external AI provider." The external API option is greyed out until the user clicks an acknowledgement checkbox.
Performance optimization for large vaults: the indexer runs note fetching and embedding in batches of 20, with a 500ms yield between batches using setTimeout to avoid blocking the Joplin UI thread. On a 2000-note vault, initial indexing completes in approximately 8-12 minutes with Ollama. Subsequent runs (incremental only) complete in under 60 seconds for typical daily note volumes.
4. Timeline
6. Timeline
| Phase | Weeks | Work |
|---|---|---|
| Community Bonding | --- | Study Joplin plugin APIs (joplin.data, panels), finalize architecture (indexer, analyser, UI), validate incremental indexing approach with mentors |
| Foundation | 1 - 2 | Plugin setup, paginated note fetching, hash-based incremental indexing, persistence in settings |
| Embeddings Pipeline | 3 - 4 | Integrate Ollama + transformers.js, generate/store embeddings, batching for performance |
| Clustering + MVP | 5 - 6 | HDBSCAN clustering, confidence thresholds, basic tag/notebook suggestions (MVP ready) |
| Intelligence Layer | 7 - 8 | LLM tag generation, confidence scoring, Command Dispatcher, duplicate prevention, rollback system |
| UI Development | 9 - 10 | React panel, Tags/Notebooks/Archive tabs, suggestion selection + apply workflow |
| Polish & Finalization | 11 - 12 | Cold note detection, settings UI, performance optimization, testing, documentation |
5. Availability
Weekly availability during GSoC
I can dedicate approximately 40 hours per week on average to the project throughout the GSoC period.
Timezone
India Standard Time (IST) (GMT+5:30).
Other commitments during the programme
I do not have any exams, internships, or other major commitments during the GSoC period. I will prioritize this project fully and ensure consistent progress. I will maintain regular communication with mentors through the Joplin forum and Discord, and provide structured weekly updates on progress.



