Audio file transcription

As I mentioned before in my introduction post, I'm working on a plugin for audio file transcription.

Context

I use Joplin mainly for jotting down meeting notes. My current workflow is as follows.

  1. I go to a meeting. I make quick notes and I turn on a voice recorder.
  2. After the meeting I put the audio file in an audio transcriber (I've been using turboscribe).
  3. The transcriber returns a big piece of text with timestamps and distinction between people.
  4. I paste the text and my notes in joplin along with my note template and tell the LLM plugin it to put it into the template.
  5. I fact check it.

Problem

This process worked but the part of uploading the audio file to the transcriber feels cumbersome. Since I just want to record the audio and paste it into my Joplin notes with the transcribed audio under it I don't want to go to an external application and upload it there to get the transcription.

Proposed Solution

Therefore I'm working on a plugin which transcribes your selected audio file right in Joplin.

Then I saw that Joplin is participating in GSoC again and I thought it would be a nice feature to have built into Joplin.

Discussion Topics

  • What do you think about this feature?
  • Would you use this feature yourself?
  • Is this a feature that should be implemented in Joplin (as a GSoC project) or should it stay a separate plugin?
6 Likes

I think having transcription functionality for audio files would be very helpful! (Particularly for locating existing audio notes)

For reference: Joplin currently has some audio transcription functionality (code, documentation). However, it's currently Android-only and only has UI for transcribing from microphone input.

1 Like

If the audio-to-text conversion works well in different languages, this would be a great feature to add to Joplin!

1 Like

RJCA In my plugin I simply implemented the gemini and openai apis so the user can choose between the two and fill in an api key, but the plugin is easily extensible for other APIs of openai and gemini don't cut it.

In my experience the openai whisper models and gemini-2.0-flash both work great in English and Dutch (even if both languages are spoken in one recording). You can also choose the model manually in the advanced settings if you require more accuracy.

1 Like