Plugin: offline OCR (extract text from images, pdf, videos, etc)

ylc395 · 27 October 2021 13:25

Joplin OCR

This plugin is still in development stage.Everything may change, but some features are available now.

Feature & UI suggestions / bug reports are welcome!

Features

OCR for local/remote images
OCR for local/remote videos
OCR for local PDFs

User Guide

Before starting using this plugin, please set the Language codes :

All available Language Code can be found at tesseract.js/docs/tesseract_lang_list.md at master · naptha/tesseract.js · GitHub

In Note View Panel, click the icon on top-right corner to start OCR

Xindi · 29 October 2021 19:41

Hey I just wanted to say keep up the great work. I am sure my people would benefit from this. Please keep posting on this page for any major improvements.

ylc395 · 30 October 2021 03:53

Thanks, I thought OCR is not a feature needed by Joplin community since there were no replies for this post

lumogas · 31 October 2021 17:43

Hi, I'm using Joplin 2.5.8, and though I get the option to set up languages on the Options menu, I am not able to see the icon you mention here to start OCR. I've set the codes for English, French, German and Spanish. What am I missing?
Sounds like a very useful plugin, at any rate.

myfta · 31 October 2021 20:48

Improving the process of OCR for Joplin notes is definitely worthwhile. It's the OCR in Evernote that is hardest to lose when migrating.

Now I have been using plamola's ocr-joplin-notes Python script to OCR existing notes. Although harun27's version also works on PDFs too.

Are you effectively converting this process to a plugin? i.e. it is intended to work on existing notes?

ylc395 · 1 November 2021 00:57

are you using WYSIWYG editor mode? This plugin doesn't support it. If not, move your mouse upon an image/video, the icon will appear on right-top corner.

ylc395 · 1 November 2021 01:07

yes, it's on roadmap. Scan and recognize all content of existing notes, right?

myfta · 1 November 2021 07:48

So when do you use your current plugin - in the WebClipper before capturing an image containing text?

ylc395 · 1 November 2021 08:07

Sometimes I take photos of books and attach them in Joplin notes. I use this plugin to extract text from them.

lumogas · 2 November 2021 11:20

I see the symbol now. Tried it out, works! Very nice. Not to be "Mr Picky", what do I need to do to use it on PDFs?
At any rate, awesome work.

ylc395 · 2 November 2021 12:31

PDF recognition is still in development, but very soon.

myfta · 5 November 2021 21:48

Seems to work on images quite well, even small fonts. When the text is recognised can it be automatically dropped into the note to avoid a manual cut and paste?
I think it is confusing showing the image of the web clipper in your example, it works on images in the note display in Joplin.
Doesn't work on PDFs yet, but a question - How will you handle multiple page PDFs?

myfta · 5 November 2021 21:52

It installs directly from Options in Joplin, so you don't need to download from Github, but does that always get you the latest version?
Can you remember the language selection from the previous run? I think most of us won't be swapping languages that often.

ylc395 · 7 November 2021 12:07

@myfta @lumogas @Xindi

PDF OCR support is available on version 0.1.0 now. Any suggestion will be helpful.

ylc395 · 7 November 2021 12:09

Nice suggestion. It will be a feature on feature version.

I updated document to make it clear.

check it on version 0.1.0

ylc395 · 7 November 2021 12:10

Do you mean language packages? It seems that they will never update, so there's no need to worry about that.

Will be a future feature

Giacomo · 7 November 2021 14:31

Thanks, very nice plugin.

bepolymathe · 7 November 2021 18:09

Seems to work fine for me. Thanks for that

kartoo · 8 November 2021 00:34

Is this an offline OCR system?

ylc395 · 8 November 2021 00:44

yes, no any upload.

Topic		Replies	Views
GSoC Idea - OCR Support Features gsoc-2020	18	2788	1 August 2024
OCR in Joplin (How to) Support	22	6487	23 March 2024
File Uploader and OCR Apps	163	16026	24 July 2024
OCR for existing Joplin notes Apps	17	4669	12 April 2021
Desktop pre-release v3.0 is now available (Updated 21/08/2024) Beta Testing	28	4320	21 August 2024

Plugin: offline OCR (extract text from images, pdf, videos, etc)

Joplin OCR

Features

User Guide

Related topics