Links
- Project Idea: https://github.com/joplin/gsoc/blob/master/ideas.md (Idea 5)
- GitHub Profile: https://github.com/Vinayreddy765
- Forum Introduction: https://discourse.joplinapp.org/t/introducing-vinayreddy/48835
- PR #14566 (merged): https://github.com/laurent22/joplin/pull/14566
- PR #14638 (merged): https://github.com/laurent22/joplin/pull/14638
- PR #14665 : https://github.com/laurent22/joplin/pull/14665
- PoC Plugin Repository:https://github.com/Vinayreddy765/joplin-plugin-image-alt-text
1. Introduction
My name is Vinay Reddy, and I am currently pursuing a Bachelor of Engineering in Information Science and Engineering at Global Academy of Technology, Bengaluru, India (expected graduation: 2027). I am based in IST (UTC+5:30).
I have been contributing to the Joplin codebase since early 2026, with three merged pull requests in the desktop application. Through these contributions I became familiar with Joplin's architecture — specifically the encryption settings UI, resource handling, and note model. For this proposal I studied Resource.ts, Note.ts, and ResourceService.ts directly to understand how images are stored and processed internally.
Joplin has a strong focus on accessibility. However, images embedded in notes currently carry no descriptive alt text — they appear in Markdown as:
markdown

This means screen readers encounter a completely blank description for every image, leaving visually impaired users with no information about the content. For all users, image content is also invisible to Joplin's search system.
AI vision models can now generate accurate, detailed image descriptions automatically. This project proposes to build a Joplin plugin that scans all images in a user's note collection, calls an AI vision model to generate a description for each one, and writes the description back as Markdown alt text:
markdown

This directly addresses the GSoC idea's expected outcome and makes Joplin's image content genuinely accessible for the first time.
2. Project Summary
Problem: Images in Joplin notes have empty alt text, making them inaccessible to screen readers and invisible to search.
What will be implemented: A Joplin plugin built with TypeScript using the official Plugin API that scans image resources, generates AI descriptions, and writes them back into note bodies as Markdown alt text. The plugin will support both a local AI provider (Ollama — free, private, no data leaves the device) and a cloud provider (OpenAI — opt-in, requires user to provide their own API key).
Expected outcome: Every image with missing or weak alt text in the user's note collection gets a meaningful accessibility description automatically— improving accessibility for visually impaired users and making image content searchable.
Technical Approach
3.1 How Joplin Stores Images — From Codebase Study
Images in Joplin are stored as resources and referenced in note bodies using resource IDs. They can appear in two formats — Markdown  or HTML
. Joplin already has an existing OCR pipeline that follows a similar scan-process-write-back pattern. This plugin follows the same logical approach at the plugin level, building on existing resource handling patterns found in Resource.ts and Note.ts.
3.2 Architecture
The plugin follows this pipeline:
Figure 1: All processing occurs locally via Ollama — no image data leaves the user's device.
The plugin reads image resources from the note via the Joplin Plugin API,prepares images for AI processing, and sends them to the configured AI provider. The provider returns a concise accessibility description which the plugin writes back into the note body as Markdown alt text. When using Ollama, this entire process happens locally on the user's device — no data is sent externally
3.3 Plugin File Structure
joplin-plugin-image-alt-text/
├── src/
│ ├── index.ts (plugin entry point, Tools menu)
│ ├── AltTextService.ts (core scanning and processing)
│ ├── ReviewDialog.ts (user review before applying changes)
│ ├── NoteUpdater.ts (writes alt text back to notes)
│ └── providers/
│ ├── AltTextProvider.ts (common interface)
│ ├── OllamaProvider.ts (local AI — free, private)
│ └── OpenAIProvider.ts (cloud AI — opt-in)
├── manifest.json
└── package.json
3.4 How the Plugin Works
The plugin reads image resources from the note via the Joplin Plugin API, converts them to Base64, and sends them to the configured AI provider. The provider returns a concise accessibility description which the plugin presents to the user for review before writing back into the note body as alt text. The provider layer is abstracted behind a common interface — both Ollama and OpenAI work identically from the user's perspective. Images are detected in both Markdown and HTML
formats to ensure no images are missed.
3.5 Privacy Design
The plugin is privacy-first by default:
Ollama (default): Runs entirely locally. No image data ever leaves the user's device. No API key required. Free.
OpenAI (opt-in): Only enabled if the user explicitly selects it and provides their own API key. A clear warning is displayed explaining that images are sent to an external server.
This directly aligns with Joplin's privacy-first philosophy.
3.6 Why This Design Works
This design ensures:
- Non-intrusive workflow — generation is always manually triggered by default. The plugin never modifies notes without the user's explicit action.
- Full user control — nothing is written to a note without passing through the review dialog first. Users can accept, edit, or skip every description.
- Privacy by default — Ollama processes everything locally on the user's device. No image data leaves the machine unless the user explicitly opts into OpenAI.
- Safe for existing content — the plugin classifies alt text before processing. Meaningful descriptions written by the user are never overwritten.
- Consistent experience — the plugin works across both Markdown and Rich Text editors, adapting its UI to match what the user sees in each editor.
3.7 Behaviour and Edge Cases
Handling Existing Alt Text
Images embedded in notes may already contain alt text. The plugin distinguishes between three cases:
Case 1 — Missing Alt Text
Example:

or

In this case, the plugin automatically generates an accessibility description and replaces the alt text.
Example result:

Case 2 — Meaningful Existing Alt Text
Example:

The plugin will not overwrite this text by default.
Instead, the image is skipped and recorded as already having descriptive alt text.
This ensures that user-written descriptions are never replaced automatically.
Case 3 — Weak Alt Text
Example:

These cases usually originate from file names rather than real descriptions.
The plugin can optionally replace these descriptions if the user enables the setting:
Replace weak alt text
The plugin uses heuristics to identify weak alt text — such as filenames, very short strings, and common auto-generated patterns. Distinguishing weak from meaningful alt text is a nuanced problem that will be refined through testing and mentor feedback during implementation.
Regeneration Rules
Users may want to regenerate alt text when:
-
the AI description is inaccurate
-
the image content changes
-
a different AI provider or model is selected
The plugin supports regeneration through:
Tools → Regenerate Alt Text
Rules:
-
regeneration applies only to images selected by the user
-
existing alt text is replaced only when regeneration is explicitly triggered
-
a preview dialog is shown before applying changes
This prevents accidental overwriting.When the user triggers regeneration, the plugin identifies all images with existing alt text in the current note, presents them one by one in the review dialog, and only replaces descriptions the user explicitly accepts
Batch Processing
Notes may contain multiple images. The plugin supports batch processing with the following workflow:
-
Scan the note for image resources
-
Identify images requiring alt text generation
-
Process images sequentially through the AI provider
-
Update the note body after processing completes
Example:
Before:


After:


Images that already contain meaningful alt text are skipped automatically.
Batch processing will also support:
-
progress reporting
-
safe cancellation
-
skipping previously processed images
Joplin's data API returns results in paginated batches. The plugin iterates through all pages using the has_more flag to ensure every note in the collection is scanned.
4.User Review and Control
Users maintain full control over changes made by the plugin.
Manual Mode (Default)
Users manually trigger generation:
Tools → Generate Alt Text
The plugin processes images only in the current note.
Preview Mode
Before applying changes, users can review generated descriptions:
Example:
AI Alt Text Suggestions — 4 images found
Image 1 of 4
Generated Description
Accept
Edit
Skip
Users can:
-
accept the suggestion
-
edit the description
-
skip the image
When the user selects Edit, an inline text field appears allowing them to correct the description before saving.
Undo Support
Before modifying any note, the plugin stores the original note body in memory. Users can revert the entire batch operation using Undo Last Batch, restoring all notes to their previous state.
Automatic Mode (Optional)
Users can enable automatic generation for newly inserted images.
Workflow:
Insert image
↓
Plugin detects image
↓
AI generates description
↓
Alt text inserted automatically
This behaviour is configurable in plugin settings.
5.Error Handling
The plugin is designed to handle common edge cases gracefully, ensuring it never crashes or corrupts note content. The following cases have been identified and will be handled during implementation
| Edge Case | Handling |
|---|---|
| Unsupported file types | Only PNG, JPG, JPEG, WEBP are processed; others silently skipped |
| Encrypted resources | Skipped — encrypted blobs cannot be read; reported in summary |
| AI runtime unavailable | Clear message: 'AI runtime not detected. Please start Ollama.' |
| Ollama model not installed | Shows exact install command: ollama pull llava |
| Large images | Resized before sending; original file on disk never modified |
| Network failure mid-batch | Skip failed image, continue, offer Retry Failed at end |
| Same resource in multiple notes | All notes updated, description generated once and cached |
| Note modified during batch | Skip to avoid conflict, reported in summary |
| OpenAI rate limiting | Wait 5 seconds and retry once; report if retry also fails |
6.Potential Challenges and Solutions
AI accuracy and user trust Vision models can occasionally produce vague or inaccurate descriptions. The plugin addresses this through careful prompt engineering and by always showing descriptions to the user for review before writing anything to notes. Users can edit any description before accepting it.
Local AI availability The plugin requires Ollama to be running locally for the default provider. If Ollama is unavailable or the required model is not installed, the plugin will show a clear actionable message guiding the user to resolve the issue rather than failing silently.
Large images and performance High-resolution images may slow processing. Large images will be resized before sending to the AI provider without modifying the original file. Users can also configure lighter models for faster processing on lower-end hardware.
Encrypted resources Joplin supports end-to-end encryption. Encrypted resources cannot be read at the plugin level and will be skipped gracefully, with a note in the completion summary.
Cloud provider limitations When using OpenAI, rate limiting and API errors will be handled with appropriate retry logic and clear user messaging. Users are also warned before enabling the cloud provider that images will leave their device.
Cross-platform compatibility Primary development and testing will be on Windows, with Linux tested via virtual machine. macOS compatibility will be validated with community assistance during the testing phase.
AI Runtime Considerations
Joplin plugins run inside a sandboxed environment which restricts the use of native modules directly within the plugin process.
For local AI inference, the plugin will therefore rely on runtime environments that work within this sandbox. The default approach uses the Ollama HTTP interface — Ollama runs as a separate local process outside the sandbox, and the plugin communicates with it via a simple HTTP API. This avoids all sandbox restrictions while keeping processing fully local on the user's device.
Alternative approaches such as WASM-based inference runtimes may also be explored during implementation as they can run directly inside the plugin sandbox without requiring a separate installation. The performance tradeoffs between these approaches will be evaluated with mentor guidance during the project.
7.User Experience and User Flows
The plugin is designed to be non-intrusive and fully controllable. Users are never surprised by changes to their notes.
Flow 1 — Single Note Mode (Default)
User opens a note and clicks Tools → Generate Alt Text. The plugin scans the note and presents each image for review one at a time:
AI Alt Text Review — 4 images found
Image 1 of 4 [Image preview]
Proposed description:
"A red torii gate surrounded by autumn maple trees at dusk."
Actions: [Accept] [Edit] [Skip]
Batch actions: [Accept] [Cancel All]
Nothing is written to the note until the user clicks Accept or Accept All.
Flow 2 — Batch Mode (All Notes)
User clicks Tools → Generate Alt Text (All Notes). The plugin processes the entire note collection in the background:
Processing notes…
47 of 312 scanned — 12 images labeled
Cancel
On completion:
✓ Processing complete
47 images labeled across 23 notes
Actions:
View Changes Undo Close
Undo is available for 60 seconds. If a note was modified during processing, it is skipped and reported in the summary.
Flow 3 — Auto-label Mode (Optional)
User enables auto-labeling in plugin settings. When a new image is inserted:
Insert image
↓
Plugin detects new image
↓
AI generates description in background
↓
Alt text inserted automatically
↓
User can edit at any time
This mode is off by default — users must explicitly enable it.
Flow 4 — Markdown Editor vs Rich Text Editor
The plugin adapts its UI based on which editor the user is working in:
| Markdown Editor | Rich Text Editor | |
|---|---|---|
| User sees | Raw |
Rendered image visually |
| How to trigger | Tools → Generate Alt Text | Right-click image → Generate Alt Text |
| Review dialog | Shows Markdown with proposed text | Shows rendered image with input field |
| Edit alt text | Edit text directly | Inline input field overlay on image |
Markdown Editor
When the plugin detects an image without alt text:

Missing alt text
User options:
Generate with AI Write manually
Rich Text Editor
Because the Rich Text editor hides Markdown syntax, the plugin exposes alt-text actions through the UI.
Available actions:
-
Right-click image → Generate Alt Text
-
Toolbar button → Alt Text
The review dialog then displays the image preview alongside the generated description.
When the user accepts a description in the Rich Text editor, the plugin writes the alt text back to the underlying note body by updating the alt attribute of the corresponding img tag directly — for example changing img src=':/resourceId' alt='' to img src=':/resourceId' alt='Generated description'.
This ensures the alt text is correctly stored in the note regardless of which editor the user is working in, and is immediately visible if the user switches to the Markdown editor.
Images in notes can appear as either Markdown or HTML
format. The plugin detects and handles both formats automatically — for Markdown images the alt text is written by replacing the text inside the square brackets, and for HTML images by updating the alt attribute directly. This ensures no images are missed and alt text is correctly stored regardless of how the image was inserted.
In all cases the user remains in control — no description is written to a note without explicit user action, except in optional auto-label mode which must be enabled in settings.
8. Evaluation and Performance Metrics
To ensure the effectiveness and reliability of the proposed plugin, the project will be evaluated using both performance metrics and quality evaluation metrics.
8.1 Performance Metrics
The performance of the plugin will be measured based on the following criteria:
Processing Time per Image
-
Measure the average time required to generate alt text for a single image.
-
Compare performance between local Ollama models and cloud-based models.
Batch Processing Performance
-
Evaluate the time required to process multiple images within a note or across the entire note collection.
-
Ensure the plugin processes images sequentially to maintain UI responsiveness.
Memory and Resource Usage
- Monitor CPU and memory usage during image processing to ensure the plugin runs efficiently on typical user hardware.
Scalability
- Test performance on notes containing a large number of images to ensure the plugin remains stable and responsive.
8.2 Evaluation Metrics
The quality and usefulness of generated alt text descriptions will be evaluated using the following criteria:
Accuracy of Image Descriptions
- Verify that generated alt text correctly describes the primary subject and key elements of the image.
Accessibility Quality
- Ensure descriptions follow accessibility guidelines by focusing on meaningful visual information useful for screen reader users.
Consistency of Output
- Evaluate whether the generated descriptions maintain a consistent style and level of detail across different images.
Acceptance rate of generated descriptions (percentage accepted without editing during user testing).
Cross-Platform Reliability
- Validate that the plugin functions correctly on Windows, macOS, and Linux environments.
8.3Testing Strategy
The plugin will be tested at two levels:
Manual Testing
- Test with different image types: PNG, JPG, JPEG, WEBP
- Test with notes containing no alt text, weak alt text, and meaningful existing alt text
- Test single note mode and batch mode with a collection of notes
- Test cancellation mid-batch and undo functionality
- Test with Ollama provider on Windows and Linux
- Test error scenarios: Ollama not running, invalid API key, encrypted resources
- Test in both Markdown editor and Rich Text editor
Automated Testing
- Run Joplin's existing test suite after each change using yarn test to ensure no regressions
- Write basic unit tests for core logic such as alt text classification (missing, weak, meaningful) and resource ID extraction from note bodies
- Write integration tests for the Ollama provider connection and note body update logic
9.Implementation Plan
Community Bonding Period (May 8 – May 25)
- Study Joplin Plugin API documentation in depth
- Explore existing OCR pipeline in Resource.ts to understand patterns to follow
- Discuss provider strategy, privacy requirements, and plugin architecture with mentors
- Finalize design decisions (model selection UI, settings structure, error handling approach)
- Set up development environment and plugin scaffold
Week 1–2 (May 26 – June 7) Plugin Foundation and Resource Scanning
- Implement plugin entry point (index.ts) with command registration
- Implement resource detection logic for images referenced in notes.
- Implement image loading and preparation for AI processing.
- Implement core image scanning service for current note.
- Write unit tests for image scanning and processing logic.
Week 3–4 (June 8 – June 21) Local AI Integration — Ollama Provider
- Implement AltTextProvider.ts interface
- Implement OllamaProvider.ts — connect to local Ollama runtime via HTTP API
- Implement NoteUpdater.ts — safely replace Markdown alt text using regex
- Write integration tests for Ollama provider and note body replacement
- Test with multiple image types (PNG, JPEG, WebP)
- Add: Implement ReviewDialog.ts — preview mode with Accept, Edit, Skip per image
Week 5–6 (June 22 – July 5) Cloud AI Integration and Edge Cases
- Implement OpenAIProvider.ts — opt-in cloud provider with user API key
- Handle all edge cases:
- Skip images that already have meaningful alt text
- Handle encrypted resources gracefully
- Handle large images (resize before sending if needed)
- Handle OpenAI rate limiting with retry logic
- Write tests for edge case handling
Week 7–8 (July 6 – July 19) Settings UI and Tools Menu
- Implement plugin settings using joplin.settings:
- Provider selection (Ollama / OpenAI)
- Ollama model name (configurable)
- OpenAI API key (secure field)
- Privacy warning for cloud provider
- Add "Generate Alt Text (AI)" command to Tools menu
- Add progress notifications during processing
- Add clear error messages (Ollama not running, API key missing, etc.)
Week 9–10 (July 20 – August 2) Batch Processing and Auto-labeling
- Extend from current-note mode to scan-all-notes mode
- Implement auto-labeling on new image insertion using joplin.workspace events
- Cross-platform QA on Windows and Linux via virtual machine; macOS with community assistance
- Fix any UI or behavior inconsistencies found during testing
Week 11 (August 3 – August 9) Mentor Feedback and Optimization
- Address all mentor code review feedback
- Optimize performance for large note collections
- Improve prompt engineering for more accurate descriptions
- Conduct accessibility audit of the plugin UI itself
Week 12 (August 10 – August 18) Final Submission
- Write complete user documentation and setup guide
- Publish plugin to the official Joplin plugin repository
- Submit final pull request
- Write final GSoC progress blog post
10.Deliverables
At the end of this project the following will exist:
- Complete Joplin plugin published to the official Joplin plugin repository
- AltTextService.ts — scans image resources in current note and all notes
- OllamaProvider.ts — local AI provider, fully private, no API key needed
- OpenAIProvider.ts — cloud AI provider, opt-in, user provides own key
- NoteUpdater.ts — safely writes AI descriptions as alt text into note bodies
- ReviewDialog.ts — user review dialog with Accept, Edit, Skip per image
- Plugin settings UI with provider selection, API key configuration, and privacy warning
- "Generate Alt Text (AI)" command in Joplin's Tools menu
- Auto-labeling of newly added images via workspace events
- Manual test coverage across all supported image types and editor modes; unit tests for core classification and parsing logic
- User documentation explaining setup for both Ollama and OpenAI
- Verified functionality on Windows, macOS, and Linux
11.Availability
- Weekly availability: 20–25 hours/week (175-hour medium project across 12 weeks)
- Time zone: IST (UTC+5:30), Bengaluru, India
- Other commitments: I have semester-end examinations in June which I am confident I can manage alongside GSoC commitments. No other planned vacations or internships during the coding period.
- Communication plan: I will contact my mentor minimum 3 times per week, post weekly progress updates on the Joplin forum, and maintain an up-to-date progress blog. I understand that lack of communication results in failing the programme.
- Remote work: I am comfortable working independently under a remote mentor across time zones. I have already done this through my prior Joplin contributions, communicating with maintainers asynchronously via GitHub and the Joplin forum.
- Language: My native language is Telugu. I am fully comfortable working and communicating in English.
Benefits to Joplin Community
This plugin directly advances Joplin's stated accessibility goals. Users who rely on screen readers will for the first time receive meaningful descriptions of images in their notes. The dual-provider design (Ollama local + OpenAI opt-in) ensures every user can benefit regardless of technical setup or budget — with privacy preserved by default. The plugin will be published to the Joplin plugin repository making it available to all Joplin users immediately upon completion.
Gsoc proposal link:Automatically label images using AI
