Project Name: AI-based Categorisation
GitHub Profile: angeladev333
Introduction post: here
1. Introduction
I am a 4th year CS student at the University of Waterloo with experience in TypeScript, React, and Python. In my previous internship at Bloomberg, I trained a ML model on real client data base on the Decision Tree Classifier to pair trade matchings for reconciliation use, and I have helped others with their RAG-related projects.
This project seeks to automate the "administrative" side of note-taking by using local AI to suggest tags, organize notebook structures, and identify "cold" notes for archiving.
2. Project Summary
This project will solve the fundamental problem about organizing notes with large context. As users use Joplin for longer periods of time, the workspace becomes larger, more notebooks are created, and they struggle to find notes on a relevant topic.
The project will be a plugin that analyzes note content to provide three core organizational services:
-
Smart Tagging: Automatically suggests and applies tags based on existing user patterns.
-
Notebook Auto-Filing: Detects when a note semantically belongs in a different notebook and suggests a move.
-
Archive Discovery: Identifies notes that haven't been touched or viewed in a long time and suggests moving them to an "Archive" stack to reduce clutter.
Expected Outcome:
-
Local-First Engine: A categorization system using
transformers.js(WASM) to keep all data private. -
Review UI: A dedicated Joplin Panel where users can "Approve" or "Reject" bulk organizational suggestions.
-
Custom Rules: Ability for users to "teach" the AI by giving it examples of how they prefer to categorize.
3. Technical Approach
3.1 Architecture: The "Librarian" Service
I will implement a background service that maintains a local semantic index of the user's notebooks.
-
Detection: Hook into
joplin.workspace.onNoteChangeandjoplin.workspace.onSyncComplete. -
Inference: Use
transformers.jswith theXenova/all-MiniLM-L6-v2model (~23MB) to generate embeddings for each note. -
Classification: * For Tagging: Use K-Nearest Neighbors (KNN) to find notes with similar content and suggest their tags.
- For Notebooks: Use a Centroid-based classifier where each notebook is represented by the average vector of its contained notes.
3.2 Archive Discovery ("Cold Note" Detection)
Since Joplin does not natively track "last viewed" time, I will implement a lightweight tracking mechanism:
-
Activity Logging: Use the
joplin.workspace.onNoteSelectionChangeevent to record alast_viewed_timein the noteāsuserData. -
Archiving Logic: A weekly background task will query notes where
(current_time - last_viewed_time) > User_Defined_Thresholdandupdated_timeis also old. These will be surfaced in the "Archive Suggestions" UI.
3.3 The "Review & Apply" Workflow (React UI)
To avoid "AI anxiety," the plugin will never move or tag notes without permission.
-
The Panel: A React-based sidebar created via
joplin.views.panels.create(). -
Batch Actions: Users can "Select All" suggestions (e.g., "Tag 12 notes as #Research") and apply them in one click via the
joplin.dataAPI.
4. Implementation Plan
-
Weeks 1-2: Setup
transformers.jsin the plugin sandbox. Implement thelast_viewed_timetracker usinguserData. -
Weeks 3-5: Develop the "Note-to-Notebook" similarity engine. Benchmarking performance for users with 2,000+ notes to ensure no UI lag.
-
Weeks 6-8: Build the React Sidebar Panel. Implement the "Suggestion" logic and the "Accept/Reject" state management.
-
Weeks 9-11: Add "Auto-Archive" discovery. Refine the UI for bulk-applying changes.
-
Week 12: Final testing on mobile/desktop sync compatibility and documentation.
5. Deliverables
-
Joplin ANI Plugin: The core
.jplpackage. -
Semantic Model Integration: Optimized local inference pipeline.
-
Archive Dashboard: A UI tool for note lifecycle management.
-
Technical Documentation: Guidelines for extending the AI to support PARA or Johnny Decimal organizational methods.
6. Availability
- Weekly availability: ~40 hours per week during SGoC
- Time zone: EST