I have been playing with the new OCR feature in 2.14 pre-release and I have a few questions:
- Does it OCR all pages of PDFs or just the first N?
- Is it smart enough to skip PDFs that already have the text in them?
- will this work in the CLI version as well?
And some suggestions:
- Is there any way to know/see progress of OCRing the notes? Or some indication that OCR has been run on a particular note/attachment?
- Options to set (1) above would be neat.
Thank you so much for bringing OCR to Joplin - it was the last thing keeping me on Evernote!
Well, no, but that's only because we need to process them so that the text is available for search. And OCR is the most reliable way to extract the text from a PDF, which can be structured in any random way.
At the moment the only way is by opening the dev tools, under the Help menu
What are you meant to see there that indicates the progress of OCRing the notes?
If you open the dev tools and filter by
OcrService you will see what it is doing, including how many resources have been processed and how many still need to be done: