Returning from a discussion here OCR for existing Joplin notes here are my "instructions" for how to achieve this in Windows.
These instructions are for Python 3.8 running in Anaconda 1.10.0 in Windows
At your Anaconda prompt install rest-uploader, ocr_joplin_notes and pytesseract
pip install rest-uploader
pip install ocr_joplin_notes
pip install pytesseract
Install Tesseract from here as a regular Windows installation Home · UB-Mannheim/tesseract Wiki · GitHub as per the recommendation I used tesseract-ocr-w64-setup-v5.0.0-alpha.20201127.exe
Add User Variables in Windows - Settings - Advanced System Settings
Variable name TESSDATA_PREFIX
Variable value C:\Program Files\Tesseract-OCR\tessdata
In the System variables edit PATH to ADD
C:\Users\graha\Anaconda3\python.exe
and
C:\Program Files\Tesseract-OCR
Whilst there add your Joplin Token:
Variable name JOPLIN_TOKEN
Variable value "your Joplin Token from Joplin - Tools - Options - Web Clipper"
Check or add PYTHONPATH in System Variables
Variable name PYTHONPATH
Variable value C:\Users<username>\Anaconda3\python.exe
In my case, I am using Python3.8 in Anaconda, but this needs to point to your Python executable.
Now despite having Anaconda being up to date with the latest version there were some issues with the installed packages. In one case having two versions installed. The solution is to uninstall and reinstall the relevant packages again using pip uninstall/install at the Anaconda prompt.
I spotted these when trying to run the final instruction for ocr_joplin_notes the various packages would be mentioned in the error messages, so I fixed them one by one.
mkl-service was missing and needed to be installed, so here is the complete list:
conda install -c conda-forge mkl-service
pip uninstall opencv-python
pip install opencv-python
pip uninstall numpy (first time to remove numpy-1.20.2)
pip uninstall numpy (second time to remove numpy-1.19.2
pip install numpy (installs a clean version of numpy-1.20.2)
pip uninstall pillow
pip install pillow
Now make sure Joplin is running. Backup all your notes to a JEX file Joplin - File - Export All - JEX
Then at a regular Windows Command prompt I can run:
python -m ocr_joplin_notes.cli --mode=TAG_NOTES
and it proceeds to tag all my notes with the scheme described in GitHub - plamola/ocr-joplin-notes: Add OCR data from PDF and image files as a comment in Joplin, to enable full-text search under Mode TAG_NOTES