After I migrated my Evernote notes to Joplin, I was missing the functionality to do a full text search in attachments. I got inspired by a post on this forum:
After having a look on how rest-uploader does OCR, I decided to write a script which could add OCR text in existing Joplin notes.
The ocr-joplin-notes script can read notes from Joplin via the web clipper interface, OCR any image or PDF and insert the text as a comment block in the note. In case of a PDF document, it can also add a preview image.
The script has a simple detection algorithm to skip notes it suspects where created by rest-uploader and the notes it already processed. The current version of this script requires a tag to be supplied on the command line. It will only process the notes with that specific tag. Once all notes with that tag have been processed, the script will terminate. More details can be found in the readme.
The ocr-joplin-notes script is written in Python and has been tested on Ubuntu.
I just published a first version to both Github and PyPi.
I guess this could also be implemented as a plugin.
It would be a completely different beast, since a plugin needs to be written in JavaScript, where this is written in Python. It also might require the user to install additional libraries, which can do the actual OCR part.
It's not a project I'm going to take on in the foreseeable future.
Hi, in your Docker instructions you indicate a docker-env file is required with the Joplin token. Where should this docker-env file be placed in a Windows 10 installation?
I checked the web clipper options page again in Joplin and it confirmed that it was already running on port 41184.
However, just to be sure I added the JOPLIN_SERVER line to the docker-env file. That has got rid of the error, but it still fails to run, here is the complete output. (I have checked the token value again, just to be sure)
docker run --env-file ./docker-env --network="host" plamola/ocr-joplin-notes:0.2.3 python -m ocr_joplin_notes.cli --mode=TAG_NOTES
Mode: TAG_NOTES
Language: eng
Add previews: yes
Autorotation: yes
Tagging notes. This might take a while. You can follow the progress by watching the tags in Joplin
Connection Error. URL: http://localhost:41184/notes?order_by=title&limit=10&page=1&token=xxxxxxxx
If the URL in the error message is actual the real URL you got in the error message (you did not replace your token with xxxxxxxx before posting it here), then you've not set up the token correctly in the environment file.
If you can paste the URL for the error message in a browser, you should get a response from Joplin, assuming you have the right token setup. If that doesn't work, something might be blocking access to the port. If that is the case, you probably also encounter a similar problem trying to use the Joplin web clipper plugin in your browser.
The --network="host" parameter should give the docker container to the network on your machine. I've never ran docker on Windows myself, so is might work differently on Windows.
Now that's interesting. I pasted the URL into the browser I get a Welcome to Joplin message and it returns the first 9 notes (as text) and finishes with: ,"has_more":true}
So I guess that is good news - correct port and token. So why does Docker not work?
Hmmm, I found the following in the docker documentation:
The host networking driver only works on Linux hosts, and is not supported on Docker Desktop for Mac, Docker Desktop for Windows, or Docker EE for Windows Server.
I guess accessing the web clipper interface isn't that easy from a docker images on a non-Linux system. I believe the Joplin web clipper service only binds to localhost. Which is good thing, from a security perspective. But therefore it can't be accessed through any actual network interfaces.
There might be a work around, by introducing a locally installed proxy, like nginx, to expose the web clipper service on a (virtual) network interface, which then could be accesses from a docker image. But that makes the whole setup a lot more complex.
So you best bet might be to install the Python library directly on your Windows machine.
I currently don't have Python installed on my machine, so I can't verify, but I suspect the command line example might have been incorrect. The script is a module, so the -m option should be specified.
Try this: python3 -m ocr_joplin_notes.cli --mode=TAG_NOTES