Voice memos to Joplin text notes

Started using Joplin two weeks ago and am genuinely impressed, mostly with its customizability. Installed only one plugin because most of the needed functionality is covered by the customizable base.
As part of my workflow I use voice memos and being able to send those to Joplin as text is very appealing to me. So I put together a command line Linux (usable from the Gnome desktop too) utility to record voice memos, transcribe them into text and create new Joplin notes (using the data API).
It uses offline speech recognition based on whisper.cpp and surprisingly, is quite usable and practical.
Please, take a look: GitHub - QuantiusBenignus/NoteWhispers: Voice memos recorded from the microphone, transcribed offline to text and converted to Joplin notes

Definitely not a one-click solution but the setup process, just like Joplin, is quite customizable and relatively easy to follow. The shell script (zsh or bash) will do some sanity checks and provide guidance, even if one does not follow the README file.


While using this little tool, if Joplin was not running, I would save the speech notes transcribed to text into JSON files stored temporarily in the user's config/resources folder. The idea was to pick them up later and insert into Joplin. This has now been implemented.
On new voice memo to Joplin note creation, if succesful (i.e. the Clipper service is up) the tool will also pick up the temporarily stored notes and insert them into Joplin.

A question: Is there a straightforward way (outside of loading plugin code - I like it lean(er) and mean) to sync such "incoming" notes automatically from the Joplin side, say on startup?

Thanks for your post! It inspired me to read a bit into the current state of (offline and google-free) speech-to-text apps, even if I barely use them. I had good results when using GitHub - ideasman42/nerd-dictation: Simple, hackable offline speech to text - using the VOSK-API.. Even with my german dialect :smiley:

The whisper model seems to have better performance and only one model for multiple languages. I'm curious for the results when trying with your script.

About the syncing: You might take a look at the REST uploader (if I understood the question correctly). Though it's only uploading rather than syncing.


Thanks for the reply!
Like the reference to the REST uploader, although I also use the Joplin REST API (with curl as a client, both uploading on creation and "syncing" by uploading the temp files) I can integrate the uploader into my workflow or borrow from it.

Concerning whisper, yes, I have been passively monitoring the state-of-the-art in the field for years (e.g. in the early 2000s using the MS Speech SDK to input formulae in Mathematica; I know, a real dinosaur here) and IMHO, with the transformer models it has now reached critical "offline" mass. I use an AMD A10 APU (circa 2012) with the 'tiny' or 'base' English only (I think whisper is very well versed in German too) models (~300 to 500 MB in memory) and I find it more than acceptable. Using an AMD Ryzen with 6 cores makes it essentially trivial, highly accurate task. Give it a try, looking at your Joppy, I think it would be a simple matter for you to quickly get all the parts up and running, plus I will get a knowledgeable and sympathetic beta tester:-)

1 Like