Thanks for your post! It inspired me to read a bit into the current state of (offline and google-free) speech-to-text apps, even if I barely use them. I had good results when using GitHub - ideasman42/nerd-dictation: Simple, hackable offline speech to text - using the VOSK-API.. Even with my german dialect
The whisper model seems to have better performance and only one model for multiple languages. I'm curious for the results when trying with your script.
About the syncing: You might take a look at the REST uploader (if I understood the question correctly). Though it's only uploading rather than syncing.