What I did
- I ran the embedding model with Transformers.js in the plugin environment.
- I ran both the BGE-small-en-v1.5 and all-MiniLM-L6-v2 to compare the speeds and I have some good findings.
- I ran a Python script to fetch some documents (around 1500) from the MediaWiki API (MediaWiki API help - Wikipedia) for testing.
- I ran the embedding model on different data sets (150 notes, 500 notes, 1000 notes, 1500 notes, 3000 notes)
BGE-small-en-v1.5
| Count | Avg per Note (ms) | Total Time (s) |
|---|---|---|
| 125 | 486 | 60.4 |
| 250 | 566 | 141.1 |
| 500 | 516 | 257.6 |
| 1000 | 717 | 716.7 |
| 1500 | 760 | 1139.5 |
all-MiniLM-L6-v2
| Count | Avg per Note (ms) | Total Time (s) |
|---|---|---|
| 150 | 377 | 56.2 |
| 500 | 396 | 197.8 |
| 1000 | 318 | 318.1 |
| 1500 | 392 | 587.6 |
| 3000 | 382 | 1147.1 |
I have added all the images of the results in the GitHub repo readme: testing-embedding-model
My findings: If we proceed with BGE-small then for a first-time user, waiting 10–15 minutes for embeddings to complete creates significant friction, whereas MiniLM reduces this to a few minutes (half for the same number of notes), making the experience far more responsive and practical. Both models share the same max token limit (512) and output dimension (384), so the quality loss is minimal (this will not affect much as we only need to give a name to the notes) while the speed gain is very large.
Note: Time calculated is without chunking, that means the notes were not totally embedded. With chunking enabled, total processing time will increase further.
Regarding async behavior: The embedding runs fully async inside a Web Worker, so it does not block Joplin's main UI thread. The user can continue editing notes while embeddings process in the background.
Screen recording:
I am not able to upload large video file(max 10mb). I have uploaded the full video in github readme.
Few more things which I tried:
- Batching: I tried embedding notes in batches of 4. It was significantly slower, likely due to increased memory pressure in the WASM runtime.
- GPU acceleration: I attempted to use WebGPU but It failed (I think it is not supported in Joplin's plugin sandbox environment, will give it a try again).
I don't want it to take this much time at the start so figuring out how I can make it work more efficiently.