[Beta Test] Joplin 2.14.12 - Cannot OCR without a sync target set?

Joplin 2.14.12 (prod, win32) Windows 10 22H2
Client ID: 2570fa372aa54cb0a6cc28091a7fd6b2
Sync Version: 3
Profile Version: 45
Keychain Supported: Yes
Revision: 2881993
Simple Backup: 1.3.5

I have just started playing with OCR in the 2.14.12 Joplin Desktop pre-release. I have a new install that is unchanged from default apart from OCR being switched on. The sync target is unchanged and shows None.

When I add a simple (11kB) PDF file the contents cannot be found, even if I wait a while. However as soon as I set up a sync target (file system in this case) it works. This has happened several times after resetting Joplin by deleting joplin-desktop and trying again.

If I do a reset and set up sync first and then add the PDF to a new note it is found immediately after the note is created.

Is this just an odd coincidence or is the OCR somehow triggered by the sync function?

ocr_test.pdf (10.4 KB)

1 Like

I can reproduce this issue on Ubuntu 23.10[1].

  1. v2.14.12, commit caf806f7033468b65d043c1ca556344f0611e9af, simulating Joplin Portable to test this pull request. ↩ī¸Ž

OCR data being available for search depends on the data being indexed by the search engine, and I think the note-resource association also being indexed. Maybe some of this doesn't happen quickly enough by default and is indirectly triggered by sync.

I tried this on Linux Mint 21.3 and got the same result.

In fact 1 hour after the note was created the text of the PDF could still not be found. I then set up a file system sync and the PDF text was searchable immediately after the first sync occurred.

It should be clarified that nothing else was being done with Joplin apart from quitting and restarting Joplin once and occasionally searching for words that were only in the PDF. So there may be some other action that also triggers OCR. However under these particular circumstances it appears that OCR data does not get created / indexed or does not get created / indexed within a reasonable time-frame.

Probably an edge-case but also possibly useful to be aware that this can happen.


In fact 1 hour after the note was created the text of the PDF could still not be found.

Not so fast! You in fact needed to wait 4 hours for indexing to work : D I've now updated this so that indexing starts right after OCR is done.

1 Like