With this solution, you can dictate text directly from your IOS/Watch OS/Android and have it automatically added to your Joplin notebook.
It is ideal for users who need to quickly jot down notes while on the go or for those who prefer dictation over typing.
Transcription will be available on the desktop app in your favorite notebook once you're back on your PC.
Disclaimer
Don't expect to see the guide on how to install this at the end of this overview.
This solution requires your likes on the post to analyze the impact and decide whether to move forward.
Features
Hands-free dictation: Seamlessly dictate text directly to your Joplin notebooks.
Works with Windows + IOS/Android: Use your iPhone/Apple Watch/Android as a microphone to add notes wherever you are.
Fast and efficient: Say goodbye to typing — just speak and see your text added to your Joplin notebook when you're back to your desktop.
Dashboard: have an overview of your voice inbox in your favorite note
Prerequisites
Joplin Desktop (Windows)
IOS/Watch OS/Android
Open AI token for speech recognition (very cheap, quick setup, guide included)
Components
Open AI account with at least 5$ on balance - is going to be used for speech recognition. Costs in my experience ~1 cent/dictation on average.
iPhone/Apple Watch/Android- we'll create a shortcut here that records your speech
AirTable - web service with a generous free tier (no payment card required) and both in-transit and at-rest encryption that stores our transcriptions. Up to 2 weeks of snapshots for data recovery;
Joplin Desktop app (Windows) - we'll configure it to listen to incoming requests by utilizing its WebClipper API.
Note Overview plugin for Joplin (optional) - allows us to have kind of a dashboard where all your voice inbox is displayed
Python Sync Script - can be copied from the repo somewhere on PC. It's going to read all available transcriptions from AirTable and push it to a specific Joplin notebook.
Task Scheduler (Windows) - Windows native app that lets us call synchronization script automatically when you log in to the PC, or based on a configured schedule.
How It Works
On your phone, you tap a button/call by voice the shortcut (automation)
Your phone records the audio
Your phone asks the Open AI Whisper model to transcribe the recording
Your phone sends transcription to the AirTable database
When you log in to PC (or per schedule) your PC asks AirTable for available notes using Python Sync Script
Script adds all collected notes to Joplin Desktop's configured notebook through WebClipper API.
Script marks collected notes as "Processed" in the AirTable database so they are not processed the next time the script is triggered
Script keeps the total transcriptions count in AirTable to be around 900 records to comply with free tier limits
Dashboard
Using the "Note Overview" plugin you can filter your voice notes inbox and display it in a table.
Example configuration:
If you lock your PC each time you step away then it makes sense to configure the synchronization on system login.
If you don't have a lock screen it makes sense to configure the synchronization based on schedule.
In either of these scenarios, you might need to understand the Sync/day to Notes/day estimated ratio.
Because of AirTable free tier monthly limitations here's a rough estimate of how many voice notes per day we can make & how many system logins (synchronizations) that relate to:
Sync/day
Notes/day
3
26
4
24
5
22
6
20
7
18
8
16
9
14
10
12
11
10
12
8
13
6
14
4
15
2
These limits are rough estimates assuming a heavy user that uses Joplin every day of the given month. Each day off slightly increases the number of voice notes you can make till the end of the month.
How To Setup?
As mentioned in the disclaimer this solution would require the author some time to prepare a good setup guide so it's easy for the average Joplin user.
If you like the idea please like the post for the author to analyze the potential impact and bookmark this thread to stay updated if an author decides to add steps to the How To Setup? section .
Since you have planned this solution - which you will likely benefit from using yourself - why not start analyzing its potential impact and drafting a setup guide now? This will ensure it works well for you and expedite others' ability to try it.
Well, I already have a working setup that uses self-hosted dockerized JSON storage instead of AirTable. It has some limitations and possible data loss which I'm fine with, but not sure about others.
I was describing steps to replicate it when I realized things are getting too complicated for the average Joplin user.
So I ended up testing AirTable API and realized it wouldn't be too much hassle to replace self-hosted storage with AirTable. On the other hand, I would need to spend at least 2 days testing and creating a guide for it.
So before going all in I decided to post a design here to see a couple of things:
how many people are interested in it;
get an actual review of the design to identify its flaws;
perhaps there are existing solutions that I couldn't find on a forum;
perhaps somebody knows how OpenAI can be easily replaced with a free solution;
perhaps Joplin plans to release a native feature soon.
You don't have to give a full fledged tutorial, but just some more details would help:
1.) How does your iOS shortcut look like? I only see Create Recording but I don't see an option to save that into a file or how to send it to OpenAI
2.) Would you mind sharing the python script (possibly along with the iOS automation)?
Just some thoughts how I might modify your idea:
1.) I think OpenAI is triggered by your iOS shortcut, right? I might actually just do the recording and upload it somewhere. The post processing would be done on a script. That has the advantage that I can still record if I don't have internet (e.g. when hiking and I get an idea)
2.) Instead of the web clipper API on windows, I'm considering using joplin-cli in a cron job script on my Linux server. Advantage is it's always on and I think simpler
I think your point mentioning hiking is valid. There should be a way to delay OpenAI transcription by storing the file locally and sending it for transcription later when a connection is back.
Also, I did not realize that the Joplin server doesn't have WebClipper. I think instead of WebClipper API calls in a script we could enhance it with the Joplin CLI based on Joplin deployment type.
P.S. not ready sharing python script. I can give you an idea on what IOS shortcut looks like + what API endpoints of AirTable are triggered in python script.
IOS Shortcut (note that FIFO queue means Airtable JSON storage in discussed design):
While I generally prefer typing to speech for note taking, I've often wanted speech input for short, quick things (e.g., "Parked in space A-23"). I don't think, though, I'd use one that has significant latency (e.g., needs connectivity), as such notes also tend to have a short useful lifetime. Understanding that it's probably a considerably larger task and require more installed resources, I'd much rather have something that runs locally on my device.
I agree. Native mobile shortcut exposed by the Joplin app itself would be great not only for long-term note-taking with an intent of further dispatch to designated notebooks but also for short-term scenarios like you described.
Unfortunately, I don't see any traction in regard to mobile app shortcuts and thus suggest at least a long-term solution when you shuffle out notes once at PC.
I haven't realized that such a use case exists really, I'll make it more obvious what use case this design suggests.
Thank you for reviewing!