GSoC 2026 Proposal Draft – Idea 3: AI-Based Categorisation – Sasha

Thank you so much for the kind words and for taking the time to review — I really appreciate it!!! And yes, I'm genuinely having a blast diving into this :grinning_face_with_smiling_eyes:

Regarding the draft: I hit the Discourse character limit so I wasn't able to append the latest revisions directly into the post. The revised proposal is saved locally and I'll make sure the final submission reflects all corrections in the GSoC website upon submission.

On your UI/UX questions — here's how I'm thinking about the embedding experience for large/long note collections:

1. Progress tracking during embedding:

Rather than a vague spinner, the sidebar panel would show a determinate progress bar with contextual info:


Indexing notes... 342 / 1,247 (27%)

├─ Current: "Meeting Notes: Q2 Planning"

├─ Speed: ~10.2 notes/sec

├─ Elapsed: 0:34

└─ [Cancel]

The progress bar updates via postMessage from the Worker after each note completes — this is the same Worker ↔ Plugin IPC pattern validated in the POC and used in Joplin's official worker example plugin (packages/app-cli/tests/support/plugins/worker/). I opted for elapsed time + percentage rather than an ETA countdown — I found out after research that remaining-time displays actually increase frustration compared to elapsed-time feedback, because users anchor on the countdown and get frustrated when it fluctuates. Elapsed time paired with a determinate percentage bar lets users mentally extrapolate without the system making a promise it might break. The speed indicator (notes/sec) still gives a rough sense of scale. Since notes vary wildly in length (34ms for a short note vs 1,000ms for a long doc page, as measured in the POC), the speed display uses an exponential moving average (alpha ~0.10) rather than a simple arithmetic mean — EMA biases toward recent throughput, which better reflects current processing speed when note lengths vary. This is consistent with benchmarking of 14 ETA algorithms showing EMA significantly outperforms naive averages for variable-speed workloads.

2. Notification when done:

  • If the user stays in Joplin: The sidebar panel transitions from the progress view to the suggestions view — "Done! 1,247 notes indexed. 23 suggestions ready for review." This is the natural flow since the panel is already open.

  • If the user switches away (or minimises Joplin): A toast notification via joplin.views.dialogs.showToast() — the plugin API exposes this with ToastType.Success to display a corner notification inside the app: "Indexing complete — 23 suggestions ready." When the user returns to Joplin, the toast is visible and the sidebar panel already shows results. For longer jobs, the panel itself retains a persistent "completed" banner so the user doesn't miss it even if the toast has already dismissed.

  • For incremental re-indexing (background, after sync): No notification unless new suggestions are generated. The sidebar panel content updates via setHtml() to show a subtle "3 new suggestions" indicator at the top of the suggestion list — the user sees it next time they glance at the panel. This avoids notification fatigue for routine background work.

3. Designing for the "large batch" UX:

The core UX problem with batch embedding is that it's a wait-then-act workflow — the user triggers "Analyse All", waits, then reviews. A few ideas to make this less painful:

  • Streaming suggestions: Don't wait until all 1,247 notes are embedded to show suggestions. As each note completes, immediately run centroid classification + KNN tagging against the already-indexed portion. Suggestions trickle into the panel in real-time — the user can start reviewing while embedding continues in the background. This turns a "wait 8 minutes then review" into "start reviewing after 10 seconds." Early suggestions might shift slightly as more notes get indexed (centroids update), but for a first pass this is much better than staring at a progress bar.

  • Smart ordering: Process notes most likely to need action first — notes in "Inbox" or the default notebook, recently created notes, untagged notes. This front-loads the useful suggestions so the user sees value immediately.

  • "Pause & Resume": If the user needs to close Joplin mid-indexing, the progress persists — the hash map and vector store are flushed to disk every 50 notes during bulk indexing (as described in section 4.4.3 of the proposal). On next launch, a banner in the panel: "Indexing paused at 342/1,247. [Resume] [Cancel]" — no lost work.

  • First-run experience: On first install with, say, 2,000 existing notes, a dedicated onboarding view in the sidebar panel: "Welcome! Let's index your notes. This takes about 3–4 minutes for your collection. You can keep using Joplin — we'll notify you when suggestions are ready." Sets expectations upfront rather than surprising them with a long-running background task.

4. One UX annoyance I'd love to fix:

The "Accept/Reject one-by-one" flow gets tedious when there are 50+ suggestions. Beyond just "Accept All / Reject All" (already in the proposal), I'm thinking about grouped actions — suggestions clustered by type: "Move 8 notes to 'Recipes'" as a single expandable card in the panel, rather than 8 individual move suggestions. The user can accept the whole group, expand to cherry-pick, or reject the batch. This mirrors how Apple's iOS 12 notification grouping and Gmail's batch actions — it respects the user's time while keeping them in control.

One important nuance here: automation bias research warns that blanket "Accept All" options risk complacency — users stop actually reviewing individual items. To mitigate this, the group card would show a preview of 2–3 representative items from the group before allowing group-level acceptance, so the user has to at least glance at what they're approving. For groups where individual items have mixed confidence scores (e.g., some green >0.85, some yellow 0.7–0.85), the card would flag this and encourage expanding before accepting. This way grouping reduces cognitive load without sacrificing review quality.

Looking forward to your feedback on the LLM/agentic sections soon!! :grinning_face_with_smiling_eyes: :grinning_face_with_smiling_eyes: