"Spoken" is my attempt to expand on the utility of NoteWhispers ( Voice memos to Joplin text notes), by adding support for insertion in Joplin of spoken (transcribed) to-dos (with auto-parsed alarms). It mostly works, (good microphone, good diction, benevolence of the ASR gods and some practice + reading TFM) but will probably remain a toy-like curiosity for most people, especially those who type fast.
Still, if you want to play with it it is here: GitHub - QuantiusBenignus/Spoken: Joplin text notes and to-dos via natural speech. Input from the microphone or audio files. Output to Joplin or clipboard. Transcribed text automatically parsed for time reference in attempt to set a to-do alarm.
Written completely in zsh on Linux, it requires zsh for best date parsing (I know, I could have written an external parser but NLP with ZSH sounds like enormous fun;) although there is also a barebones bash translation.
Question: While creating the to-do JSON payload for the API, I noticed that Joplin ignores the "todo_due" key and I had to resort to an additional PUT call to set the alarm. Is this expected behavior, i.e. not implemented? Haven't looked at the Joplin source code yet.
Thanks.
2 Likes
Very nice tool, thanks for sharing. Indeed the fact that it's written in zsh means not many people will be able to use it. Any reason you didn't directly package it as a Joplin plugin?
Question: While creating the to-do JSON payload for the API, I noticed that Joplin ignores the "todo_due" key and I had to resort to an additional PUT call to set the alarm.
I believe that will be fixed in the next release
1 Like
The shell (zsh and bash here) is the glue fabric, already sitting there and available to connect a zoo of executable components:
- Memory and CPU hungry C/C++ port of the whisper ASR engine
- Sox or ffmpeg for audio recording/processing
- jp for light json work
- a text parser and formatter to prepare the trascription and extract time scheduling info.
- http client (curl) for access to the Joplin API
IMHO, It would make sense to attack this the plugin way if I could make it really platform independent.
Writing 3.,4.,5. in javascript / typescript would be relatively easy, handling 2. in a platform independent way maybe not so. As for 1. I will just have to wrap it and live with the separate executable or library that needs to be compiled with per-hardware/platform optimizations for decent performance anyway. I will read up on the plugin architecture for sure as I have other automation ideas for Joplin, but this concept may not be the best candidate. Great to hear that the "todo_due" key will be "externally set-able" in the next release. This use scenario (externally set allarms) is not that common to warrant it but it will save this script a few ms of interaction time.
Thanks for the reply and kudos on Joplin - one of the most flexible and versatile PIM tools that I have ever encountered!
1 Like
I wonder if it can be compiled to WASM with Emscripten.
But then last time I tried I could not use a WASM module from a plugin.
1 Like
whisper.cpp has a WASM port already (whisper.cpp/examples/whisper.wasm at master · ggerganov/whisper.cpp · GitHub) but I understand the performance takes a serious hit. And it requires that the browser engine (I guess the Electron core in this case) supports 128-bit SIMD instructions, for the speed not to be a joke. And then the question of using it from a plugin (I have to finally read on how the Joplin plugins plug in :-).
BTW, speaking of whisper.cpp, the tech. behind this notes/to-dos transcriber, the New Yorker has an interesting read, that I think gives it the credit it deserves. For those curious: Whispers of A.I.’s Modular Future | The New Yorker . I think that it is especially well suited for the transcription of Joplin notes and to-dos because they typically result in short snippets of text, so the load on the CPU/memory is manageable and the output is near real-time. In any case, given its resource footprint, keeping this tool external (not a plugin), using the Linux shell as the orchestrator and transporting the result via the REST API of Joplin, is probably the most efficient method resource-wise. Instead of a plugin, I will likely attempt to write an external text pre-processor (for datetime parsing) in, say, Rust so that the rest can be written as a shell-independent script, in a syntax that is uniform across zsh, bash and other shells. That is the flexibility that Linux affords us, doing something similar under Windows, so close to the metal, will be, IMHO, challenging.
If you're limiting it to Linux only, you might as well write a native module using for instance Neon and call it either from Joplin or, more likely, from a plugin.
Neon is very easy to use and you can write Rust and call it from JS.
Thanks for the suggestion! Neon seems interesting, I am also considering Tauri. But the workflow does not call for a GUI (aside from the existing desktop menu), so short term, I will focus on writing a semi-robust CLI text parser for the to-do scheduler, so that the rest of the logic can be written in a single, polyglot shell script that does not need to rely on extended, shell-specific globbing and mandate translation across shells, as is now the case.