I did got through it and ran the embedding model (BGE-small-en-v1.5 and all-MiniLM-L6-v2) with transformer.js in the plugin environment.
My findings: BGE-small takes roughly twice as long to embed the same notes compared to all-MiniLM-L6-v2. Both models share the same max token limit (512)(this one is little confusing MiniLM no proper mention of token limit ) and output dimension (384), so the quality loss is minimal while the speed gain is very large.
I have added all the images of the results in the GitHub repo readme: testing-embedding-model
I have explained it in more detail: here
The architecture does not close it off, but there is one thing that would need to change, the vector storage. I am currently using Vectra, which stores embeddings as local files on disk. Mobile plugins don't have filesystem access, so that would break. To fix this I can use joplin's built-in API, that stores data directly on notes.
Other barier:
Ollama will not work on mobile but openAI and Gemini will work perfectly.