File Uploader and OCR

Excellent! I should document the notebook thing a little better / put it in the cli output or something.

Great to hear your feedback, thanks!

1 Like

I just tried to set get a headless setup going and was able to do so using @steve28’s approach. However, when I set the monitoring path to the mountpoint of a webdav mount the script failed to detect new files when I added them on the webdav server. Adding them locally (directly to the mountpoint) worked just fine. Not sure what’s causing this.

Yeah, I had that same problem. I was able to get it working with an NFS mount, however. For me it was an SMB mount that wouldn’t detect new files.

@Shamp0o @steve28 watchdog is used to monitor changes on the filesystem. Depending on the OS and the type of the mount point, it’s possible that it can fail to detect changes. But these are problems that have to be fixed in the watchdog code.

e.g. on Linux one usually monitors the change of the inode data. However, (faulty) implementations of nfs/smb might not propagate that info on the fs level or certain components might not even have support for that info at all.

I see. Well, I guess it’s time to look into other solutions then.

@steve28 did you change any settings in your terminal joplin app to automatically sync whenever there’s a change? I have to manually sync in order to make notes created by the rest uploader appear. Running a regular instance of joplin alongside the server like you described in your post doesn’t work for me.

update

I released a new version. It adds automatic tagging functionality based on OCR/body text.

Changes:

  • Ignore .part files (caused app crashes on KDE systems)
  • Added logic to automatically assign tags to newly created notes
  • Added click option to turn off automatic tag generation

The tag matching logic is not overly sophisticated, so if you don’t like it, just turn it off using the command line option --autotag=no

Thanks!

2 Likes

Trying to get this work with JoplinPortable in Windows.

I have installed Poppler and tesseract-OCR and ensured environment variables work for both.

I have downloaded rest_uploader-1.5.0.tar.gz, extracted with 7Zip and placed the tree in my Progs folder.

I have installed WinPyhon and confirmed in a Command Prompt window typing “python -V” returns “Python 3.7.7”

When I type

python -m rest_uploader.cli /path/to/directory

it returns

python.exe: Error while finding module specification for 'rest_uploader.cli' (ModuleNotFoundError: No module named 'rest_uploader')

I have tried adding the path to rest_uploader and rest_uploader-1.5.0 to PATH but get the same error message.

I’m guessing I’m missing the step where I ‘tell’ WinPython where rest_uploader is? This is the first time I’ve met Python and researching it has me massively confused as there seems so many different ways to use it.

I also notice there isn’t a file called rest_uploader.cli

Any help would be appreciated.

1 Like

Ah, packaging in python - not always the most straightforward. I think you’re most of the way there!
Try this from a powershell prompt (you might need to run powershell as admin if you don’t have a virtual environment set up):
pip install rest_uploader
Then you should be able to call rest_uploader.exe from the command line.

Thanks for replying so quickly

PS D:\Progs\wpy64-3770\python-3.7.7.amd64> PIP install rest_uploader
PIP : The term 'PIP' is not recognized as the name of a cmdlet, function, script file, or operable program

both in PowerShell as admin or not

Interesting that when I type python -V in powershell nothing is returned but in a Command Prompt window I get the version no. Bizarre.

So it looks like my download of WinPython doesn’t include PIP (although it says it does)

WinPython has a program called WinPython Command Prompt.exe so ran it and voila:

D:\Progs\WPy64-3770\scripts>pip install rest_uploader
Collecting rest_uploader
  Downloading rest_uploader-1.5.0-py2.py3-none-any.whl (9.3 kB)
Requirement already satisfied: requests in d:\progs\wpy64-3770\python-3.7.7.amd64\lib\site-packages (from rest_uploader) (2.23.0)
Collecting Click>=6.0
  Downloading click-7.1.2-py2.py3-none-any.whl (82 kB)
     |████████████████████████████████| 82 kB 6.1 MB/s
Collecting tabulate
  Downloading tabulate-0.8.7-py3-none-any.whl (24 kB)
Collecting PyPDF2
  Downloading PyPDF2-1.26.0.tar.gz (77 kB)
     |████████████████████████████████| 77 kB 5.1 MB/s
Collecting pytesseract
  Downloading pytesseract-0.3.4.tar.gz (13 kB)
Collecting pdf2image
  Downloading pdf2image-1.13.1-py3-none-any.whl (10.0 kB)
Collecting watchdog
  Downloading watchdog-0.10.2.tar.gz (95 kB)
     |████████████████████████████████| 95 kB 6.4 MB/s
Requirement already satisfied: chardet<4,>=3.0.2 in d:\progs\wpy64-3770\python-3.7.7.amd64\lib\site-packages (from requests->rest_uploader) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in d:\progs\wpy64-3770\python-3.7.7.amd64\lib\site-packages (from requests->rest_uploader) (1.25.9)
Requirement already satisfied: certifi>=2017.4.17 in d:\progs\wpy64-3770\python-3.7.7.amd64\lib\site-packages (from requests->rest_uploader) (2020.4.5.1)
Requirement already satisfied: idna<3,>=2.5 in d:\progs\wpy64-3770\python-3.7.7.amd64\lib\site-packages (from requests->rest_uploader) (2.9)
Collecting Pillow
  Downloading Pillow-7.1.2-cp37-cp37m-win_amd64.whl (2.0 MB)
     |████████████████████████████████| 2.0 MB 6.4 MB/s
Collecting pathtools>=0.1.1
  Downloading pathtools-0.1.2.tar.gz (11 kB)
Building wheels for collected packages: PyPDF2, pytesseract, watchdog, pathtools
  Building wheel for PyPDF2 (setup.py) ... done
  Created wheel for PyPDF2: filename=PyPDF2-1.26.0-py3-none-any.whl size=61087 sha256=f6cbc86401db9268147b8a3e5e5c83d5a981427bd999923d50f89e109a9c51f8
  Stored in directory: c:\users\roger\appdata\local\pip\cache\wheels\80\1a\24\648467ade3a77ed20f35cfd2badd32134e96dd25ca811e64b3
  Building wheel for pytesseract (setup.py) ... done
  Created wheel for pytesseract: filename=pytesseract-0.3.4-py2.py3-none-any.whl size=13439 sha256=46280639c984c8545fa5ecdee583c90bfae3fe6e49de19feec5d683e5955722c
  Stored in directory: c:\users\roger\appdata\local\pip\cache\wheels\51\71\e6\07d988150d601c7c4ec08665d9c048a6a4c1c57f25c2c34e53
  Building wheel for watchdog (setup.py) ... done
  Created wheel for watchdog: filename=watchdog-0.10.2-py3-none-any.whl size=73609 sha256=1b0317709c21be3ab74a566715865df324e78d554a3df10e68ed84bfe057a5aa
  Stored in directory: c:\users\roger\appdata\local\pip\cache\wheels\36\93\24\29e375ee74e0f178889e6906cb73e693e9f06a5f589dcee6b9
  Building wheel for pathtools (setup.py) ... done
  Created wheel for pathtools: filename=pathtools-0.1.2-py3-none-any.whl size=8790 sha256=c1c7e6babf79f35b7dc1a0305f8af9a83953b3ce2b1281959c4d08091e1c81fd
  Stored in directory: c:\users\roger\appdata\local\pip\cache\wheels\3e\31\09\fa59cef12cdcfecc627b3d24273699f390e71828921b2cbba2
Successfully built PyPDF2 pytesseract watchdog pathtools
Installing collected packages: Click, tabulate, PyPDF2, Pillow, pytesseract, pdf2image, pathtools, watchdog, rest-uploader
Successfully installed Click-7.1.2 Pillow-7.1.2 PyPDF2-1.26.0 pathtools-0.1.2 pdf2image-1.13.1 pytesseract-0.3.4 rest-uploader-1.5.0 tabulate-0.8.7 watchdog-0.10.2
WARNING: You are using pip version 20.0.2; however, version 20.1 is available.
You should consider upgrading via the 'D:\Progs\WPy64-3770\python-3.7.7.amd64\python.exe -m pip install --upgrade pip' command.
D:\Progs\WPy64-3770\scripts>

This prompts some questions

  • is that what you expected? is it all OK?
  • it says it downloaded rest_uploader…whl - so I needn’t have downloaded the .gz version?

Now I get:

D:\Progs\WPy64-3770>python -m rest_uploader d:\Temp\Scans
D:\Progs\WPy64-3770\python-3.7.7.amd64\python.exe: No module named  rest_uploader.__main__; 'rest_uploader' is a package and cannot be directly executed

I have found that rest_uploader.exe is present in D:\Progs\WPy64-3770\python-3.7.7.amd64\Scripts

Any ideas?

having installed via pip, you don’t need to call it as a python module… (although I would have thought that would work?) – just call it using rest_uploader.exe.
rest_uploader.exe d:\Temp\Scans
evidently the .exe is optional, at least on my Windows box.

D:\Progs\WPy64-3770\python-3.7.7.amd64\Scripts>rest_uploader.exe d:\Temp\Scans  --autotag=no
Launching Application rest_uploader.cli.main
Language: eng
Automatically Tag Notes? no
Paste your Joplin API    Token:
Monitoring directory d:\Temp\Scans for files
Exception in thread Thread-1:
	Traceback (most recent call last):
  File "D:\Progs\WPy64-3770\python-3.7.7.amd64\lib\threading.py", line 926, in _bootstrap_inner
	self.run()
  File "D:\Progs\WPy64-3770\python-3.7.7.amd64\lib\site-packages\watchdog\observers\api.py", line 196, in run
	self.dispatch_events(self.event_queue, self.timeout)
  File "D:\Progs\WPy64-3770\python-3.7.7.amd64\lib\site-packages\watchdog\observers\api.py", line 369, in dispatch_events
	handler.dispatch(event)
  File "D:\Progs\WPy64-3770\python-3.7.7.amd64\lib\site-packages\watchdog\events.py", line 336, in dispatch
	}[event.event_type](event)
  File "D:\Progs\WPy64-3770\python-3.7.7.amd64\lib\site-packages\rest_uploader\rest_uploader.py", line 70, in on_created
	self._event_handler(event.src_path)
  File "D:\Progs\WPy64-3770\python-3.7.7.amd64\lib\site-packages\rest_uploader\rest_uploader.py", line 53, in _event_handler
	if filesize < 1 or (ext == ".pdf" and not pdf_valid(path)):
  File "D:\Progs\WPy64-3770\python-3.7.7.amd64\lib\site-packages\rest_uploader\img_process.py", line 26, in pdf_valid
	if open_pdf(filename) is None:
  File "D:\Progs\WPy64-3770\python-3.7.7.amd64\lib\site-packages\rest_uploader\img_process.py", line 17, in open_pdf
	pdfFileObject = open(filename, "rb")
PermissionError: [Errno 13] Permission denied: 'd:\\Temp\\Scans\\ESET problem.pdf'

The app is working, yay! Looks like a permissions thing now. You might need to be more permissive on the directory it’s monitoring, or run the program using elevated permissions.

Tried running it in a Command Prompt window as administrator and nothing happens, get the prompt immediately upon pressing Enter.

Went back to the normal Command Prompt Window and similarly nothing happens when I type rest_uploader.exe anything. No idea what’s going on or not! rest_uploader.exe isn’t happy.

Try monitoring a different directory, or giving full read/write access to Everyone on the directory you wish to monitor.

Done both and no change. Typing just rest_uploader.exe with no folderPath should complain? but it doesn’t.

How did you install Python? The defaults aren’t great direct from Python, (doesn’t create an environment path var by default, installs just for local user, not all users) and I’ve never tried from the Windows Store.
rest_uploader appears to be working in your WinPython Command Prompt, though, if that’s where you’re getting the PermissionError – it just doesn’t like the permissions on the folder/file you’re dropping in the folder for whatever reason.

Really appreciate your help with this. Have to give it a rest for now - wife calling! Will experiment tomorrow

Right on - let me know if you get it working - it has really helped my workflow!

Not working yet. New attempt to get rest_uploader working

  • d/l Winpython64-3.7.7.0dot.exe - a Portable Windows Python in a 7zip auto-extractor from https://github.com/winpython/winpython/releases/tag/2.3.20200319
    (got to from https://winpython.github.io/)
  • ran it, extracted to D:\Progs, it created it’s root as WPy64-3770
    (there is no ‘install’ as it is a portable application)
  • ran D:\Progs\WPy64-3770\WinPython Command Prompt.exe
  • typed pip install rest_uploader
    this resulted in a d/l of 645 files 9.7 MB added to
    D:\Progs\WPy64-3770\python-3.7.7.amd64\Lib\site-packages
    plus some exe files in D:\Progs\WPy64-3770\python-3.7.7.amd64\Scripts
    one of which is rest_uploader.exe
  • checked Everyone had Full control access of D:\Temp\Scans and one test pdf was present
  • checked PATH includes D:\Progs\Poppler-0.68.0\bin;D:\Progs\Tesseract-OCR
  • ran D:\Progs\WPy64-3770\WinPython Command Prompt.exe
  • typed rest_uploader.exe d:\Temp\Scans --autotag=no
  • started OK, asked for token the first time, cursor sat blinking nothing arrived in inbox notebook

No idea what to do now, any suggestions?

It won’t do anything with existing files in the folder, only new ones. Try dropping a new file in the folder while it is running…