Plugin: offline OCR (extract text from images, pdf, videos, etc)

Joplin OCR

This plugin is still in development stage.Everything may change, but some features are available now.

Feature & UI suggestions / bug reports are welcome!

Features

  • OCR for local/remote images
  • OCR for local/remote videos
  • OCR for local PDFs

User Guide

Before starting using this plugin, please set the Language codes :

All available Language Code can be found at tesseract.js/tesseract_lang_list.md at master · naptha/tesseract.js · GitHub

In Note View Panel, click the icon on top-right corner to start OCR

23 Likes

Hey I just wanted to say keep up the great work. I am sure my people would benefit from this. Please keep posting on this page for any major improvements.

Thanks, I thought OCR is not a feature needed by Joplin community since there were no replies for this post :joy:

2 Likes

Hi, I'm using Joplin 2.5.8, and though I get the option to set up languages on the Options menu, I am not able to see the icon you mention here to start OCR. I've set the codes for English, French, German and Spanish. What am I missing?
Sounds like a very useful plugin, at any rate.

Improving the process of OCR for Joplin notes is definitely worthwhile. It's the OCR in Evernote that is hardest to lose when migrating.

Now I have been using plamola's ocr-joplin-notes Python script to OCR existing notes. Although harun27's version also works on PDFs too.

Are you effectively converting this process to a plugin? i.e. it is intended to work on existing notes?

1 Like

are you using WYSIWYG editor mode? This plugin doesn't support it. If not, move your mouse upon an image/video, the icon will appear on right-top corner.

yes, it's on roadmap. Scan and recognize all content of existing notes, right?

2 Likes

So when do you use your current plugin - in the WebClipper before capturing an image containing text?

Sometimes I take photos of books and attach them in Joplin notes. I use this plugin to extract text from them.

1 Like

I see the symbol now. Tried it out, works! Very nice. Not to be "Mr Picky", what do I need to do to use it on PDFs?
At any rate, awesome work.

PDF recognition is still in development, but very soon.

1 Like
  1. Seems to work on images quite well, even small fonts. When the text is recognised can it be automatically dropped into the note to avoid a manual cut and paste?

  2. I think it is confusing showing the image of the web clipper in your example, it works on images in the note display in Joplin.

  3. Doesn't work on PDFs yet, but a question - How will you handle multiple page PDFs?

  1. It installs directly from Options in Joplin, so you don't need to download from Github, but does that always get you the latest version?

  2. Can you remember the language selection from the previous run? I think most of us won't be swapping languages that often.

@myfta @lumogas @Xindi

PDF OCR support is available on version 0.1.0 now. Any suggestion will be helpful.

2 Likes

Nice suggestion. It will be a feature on feature version.

I updated document to make it clear.

check it on version 0.1.0

Do you mean language packages? It seems that they will never update, so there's no need to worry about that.

Will be a future feature

1 Like

Thanks, very nice plugin.

Seems to work fine for me. Thanks for that :+1:

Is this an offline OCR system?

yes, no any upload.