Joplin Voice-to-Note Solution (IOS/Android + Windows)

Make your Joplin notebooks hands-free!

With this solution, you can dictate text directly from your IOS/Watch OS/Android and have it automatically added to your Joplin notebook.
It is ideal for users who need to quickly jot down notes while on the go or for those who prefer dictation over typing.
Transcription will be available on the desktop app in your favorite notebook once you're back on your PC.

Disclaimer

Don't expect to see the guide on how to install this at the end of this overview.
This solution requires your likes on the post to analyze the impact and decide whether to move forward.

Features

  • Hands-free dictation: Seamlessly dictate text directly to your Joplin notebooks.
    Works with Windows + IOS/Android: Use your iPhone/Apple Watch/Android as a microphone to add notes wherever you are.
  • Fast and efficient: Say goodbye to typing — just speak and see your text added to your Joplin notebook when you're back to your desktop.
  • Dashboard: have an overview of your voice inbox in your favorite note

Prerequisites

  • Joplin Desktop (Windows)
  • IOS/Watch OS/Android
  • Open AI token for speech recognition (very cheap, quick setup, guide included)

Components

  • Open AI account with at least 5$ on balance - is going to be used for speech recognition. Costs in my experience ~1 cent/dictation on average.
  • iPhone/Apple Watch/Android- we'll create a shortcut here that records your speech
  • AirTable - web service with a generous free tier (no payment card required) and both in-transit and at-rest encryption that stores our transcriptions. Up to 2 weeks of snapshots for data recovery;
  • Joplin Desktop app (Windows) - we'll configure it to listen to incoming requests by utilizing its WebClipper API.
  • Note Overview plugin for Joplin (optional) - allows us to have kind of a dashboard where all your voice inbox is displayed
  • Python Sync Script - can be copied from the repo somewhere on PC. It's going to read all available transcriptions from AirTable and push it to a specific Joplin notebook.
  • Task Scheduler (Windows) - Windows native app that lets us call synchronization script automatically when you log in to the PC, or based on a configured schedule.

How It Works

  • On your phone, you tap a button/call by voice the shortcut (automation)
  • Your phone records the audio
  • Your phone asks the Open AI Whisper model to transcribe the recording
  • Your phone sends transcription to the AirTable database
  • When you log in to PC (or per schedule) your PC asks AirTable for available notes using Python Sync Script
  • Script adds all collected notes to Joplin Desktop's configured notebook through WebClipper API.
  • Script marks collected notes as "Processed" in the AirTable database so they are not processed the next time the script is triggered
  • Script keeps the total transcriptions count in AirTable to be around 900 records to comply with free tier limits

Dashboard

Using the "Note Overview" plugin you can filter your voice notes inbox and display it in a table.
Example configuration:

<!-- note-overview-plugin
search: notebook:Unsorted -title:"Joplin Voice Inbox"
fields: title, updated_time
sort: updated_time DESC
-->
<!--endoverview-->

Result:

Limits

Sync Times to Notes/Day ratio

If you lock your PC each time you step away then it makes sense to configure the synchronization on system login.
If you don't have a lock screen it makes sense to configure the synchronization based on schedule.
In either of these scenarios, you might need to understand the Sync/day to Notes/day estimated ratio.
Because of AirTable free tier monthly limitations here's a rough estimate of how many voice notes per day we can make & how many system logins (synchronizations) that relate to:

Sync/day Notes/day
3 26
4 24
5 22
6 20
7 18
8 16
9 14
10 12
11 10
12 8
13 6
14 4
15 2

These limits are rough estimates assuming a heavy user that uses Joplin every day of the given month. Each day off slightly increases the number of voice notes you can make till the end of the month.

How To Setup?

As mentioned in the disclaimer this solution would require the author some time to prepare a good setup guide so it's easy for the average Joplin user.

If you like the idea please like the post for the author to analyze the potential impact and bookmark this thread to stay updated if an author decides to add steps to the How To Setup? section .

3 Likes

Since you have planned this solution - which you will likely benefit from using yourself - why not start analyzing its potential impact and drafting a setup guide now? This will ensure it works well for you and expedite others' ability to try it.

Well, I already have a working setup that uses self-hosted dockerized JSON storage instead of AirTable. It has some limitations and possible data loss which I'm fine with, but not sure about others.

I was describing steps to replicate it when I realized things are getting too complicated for the average Joplin user.
So I ended up testing AirTable API and realized it wouldn't be too much hassle to replace self-hosted storage with AirTable. On the other hand, I would need to spend at least 2 days testing and creating a guide for it.

So before going all in I decided to post a design here to see a couple of things:

  • how many people are interested in it;
  • get an actual review of the design to identify its flaws;
  • perhaps there are existing solutions that I couldn't find on a forum;
  • perhaps somebody knows how OpenAI can be easily replaced with a free solution;
  • perhaps Joplin plans to release a native feature soon.

Thanks, this inspires me!

You don't have to give a full fledged tutorial, but just some more details would help:

1.) How does your iOS shortcut look like? I only see Create Recording but I don't see an option to save that into a file or how to send it to OpenAI

2.) Would you mind sharing the python script (possibly along with the iOS automation)?

Just some thoughts how I might modify your idea:

1.) I think OpenAI is triggered by your iOS shortcut, right? I might actually just do the recording and upload it somewhere. The post processing would be done on a script. That has the advantage that I can still record if I don't have internet (e.g. when hiking and I get an idea)

2.) Instead of the web clipper API on windows, I'm considering using joplin-cli in a cron job script on my Linux server. Advantage is it's always on and I think simpler

1 Like

Thank you for the feedback.

I think your point mentioning hiking is valid. There should be a way to delay OpenAI transcription by storing the file locally and sending it for transcription later when a connection is back.

Also, I did not realize that the Joplin server doesn't have WebClipper. I think instead of WebClipper API calls in a script we could enhance it with the Joplin CLI based on Joplin deployment type.

P.S. not ready sharing python script. I can give you an idea on what IOS shortcut looks like + what API endpoints of AirTable are triggered in python script.

IOS Shortcut (note that FIFO queue means Airtable JSON storage in discussed design):




Airtable Endpoints:

  1. GET
    find all
    https://api.airtable.com/v0/appID/tblLID
  2. POST
    push one
    https://api.airtable.com/v0/appID/tblLID
    Body example:
{
    "records": [
      {
        "fields": {
          "text": "lorem ipsum 6"
        }
      }
    ]
  }
  1. PATCH
    set processed (when transcriptions were read by script and does not need to be processed again)
    https://api.airtable.com/v0/appID/tblLID
{
    "records": [
      {
        "fields": {
            "processed": true
        },
        "id": "recWTfJ4lsNP0d1hu"
      },
      {
        "fields": {
            "processed": true
        },
        "id": "rec3tijZTY1wxGgIy"
      }
    ]
  }
  1. GET
    find all non-processed (filter out already synced transcriptions). Filter is encoded string "processed != 1"
    https://api.airtable.com/v0/appID/tblLID?filterByFormula=processed%20!%3D%201
1 Like

While I generally prefer typing to speech for note taking, I've often wanted speech input for short, quick things (e.g., "Parked in space A-23"). I don't think, though, I'd use one that has significant latency (e.g., needs connectivity), as such notes also tend to have a short useful lifetime. Understanding that it's probably a considerably larger task and require more installed resources, I'd much rather have something that runs locally on my device.

3 Likes

I agree. Native mobile shortcut exposed by the Joplin app itself would be great not only for long-term note-taking with an intent of further dispatch to designated notebooks but also for short-term scenarios like you described.

Unfortunately, I don't see any traction in regard to mobile app shortcuts and thus suggest at least a long-term solution when you shuffle out notes once at PC.

I haven't realized that such a use case exists really, I'll make it more obvious what use case this design suggests.
Thank you for reviewing!

2 Likes

Just a thought: I am reading this after posting a message to the group about the same problem. I don't like the idea, but at least it should be possible,
The idea to use google keep widget as this has all the functionality(voice,text,photo,drawing) and get these into joplin via the API.
Maybe a possible step to get something useable,
Now I use Google Keep on the phone and the desktop to move stuff into Joplin.

I just saw this in this group and now think this is interesting; maybe I am going wrong.
keep script
"[SOLVED] Importing from Google Keep - #31 by pluraldon"

This was the github
"GitHub - djsudduth/keep-it-markdown: Convert Google Keep notes dynamically to markdown for Obsidian, Logseq, Joplin and Notion using the unofficial Keep API. Also, import simple markdown notes back into Google Keep."