Pre-release OCR(!) some questions/suggestions

steve28 · 8 January 2024 19:00

I have been playing with the new OCR feature in 2.14 pre-release and I have a few questions:

Does it OCR all pages of PDFs or just the first N?
Is it smart enough to skip PDFs that already have the text in them?
will this work in the CLI version as well?

And some suggestions:

Is there any way to know/see progress of OCRing the notes? Or some indication that OCR has been run on a particular note/attachment?
Options to set (1) above would be neat.

Thank you so much for bringing OCR to Joplin - it was the last thing keeping me on Evernote!

laurent · 8 January 2024 19:30

All pages

Well, no, but that's only because we need to process them so that the text is available for search. And OCR is the most reliable way to extract the text from a PDF, which can be structured in any random way.

Not currently

At the moment the only way is by opening the dev tools, under the Help menu

robe070 · 4 February 2024 23:44

What are you meant to see there that indicates the progress of OCRing the notes?

laurent · 4 February 2024 23:49

If you open the dev tools and filter by OcrService you will see what it is doing, including how many resources have been processed and how many still need to be done:

Topic		Replies	Views
OCR selectively Features	0	44	14 November 2024
Is there a way to extract text using the built in OCR capability? Support	3	287	5 April 2024
Pdf viewer bug? Why show only half the doc? Features	2	96	4 October 2024
Import from Evernote...why doesn't existing OCR get included? Lounge	11	907	7 December 2023
OCR for Desktop? Features	1	1342	14 October 2018

Pre-release OCR(!) some questions/suggestions

Related topics