File Uploader and OCR

Any chance to run this on a windows machine:)? How?

Step 1: erase your Windows partition and install your Linux distro of choice... .. j/k :smile:

The github link has directions for Windows - it's a little trickier but it does work. The main difference is the initial requirement that you manually install poppler and tesseract. Ensure environment variables work for both so that you can type poppler or tesseract in a command prompt.

1 Like

Yes, it does work well under Windows.

Is there any manual available for the windows setup?

@iknow79 2 comments above it literally says: The github link has directions for Windows

I was able to use rest_uploader to get a server-based on-button scan to Jopplin. Please let me know how I can improve:

1 Like

Thanks, Steve. The workflow you describe is very similar to the one I’ve used over the years - it’s why I built rest_uploader. I have a cheap network scanner, which uploads a scanned doc via FTP to a monitored directory on a computer running rest_uploader and Joplin. I’m not using joplin-cli though I totally get what you’re saying. Unfortunately, I’m not aware of a more streamlined way to accomplish it, but someone else in this thread may have a better idea!

I also started using rest_uploaded and find it very useful for my workflow.

Two things I would love for this

  1. Update already existsing notes
    at leat for *.txt files it would be nice the uploader recognize that an existing note changed (via checksum or whatever) and uploads that (modified) file again.
  2. Have some way to feed a note somewhere in the document structure in some existing (sub)notebook. For my understanding currently all notes are going into “document-root”.

Hi resi,
Thanks for the feedback!
#1 - I don’t think it’d be terribly difficult to implement, but my concern would be if you’re a save-early-save-often person working on a file in the directory – you’d spam your upload notebook with a bunch of slightly modified files. Could get real ugly real quick.
#2 - You can change the upload notebook by modifying the settings.py file in the rest_uploader installation directory, which you should able to find in the site-packages directory of your python installation directory. In the future, this would be better as a command line option.

I’ve modified it to use the folder that has the note most recently tagged with ‘here’.

Hi kellerjustin,
thanks for your answer. Regarding #1 it's less a save-often thing, it is I am having some notes in my local filesystem (created otherwise, not w/ joplin) and I want them to be reflected in joplin as well, which is really nice as I do have them available also on mobile. But this notes might change - not necessarily very often - but then of course I want the newest available in joplin. Maybe this can be configurable to be turned on or off.
#2 I see, was not aware of this configuration. At least it would go to my "inbox" which is much better then "root" but one step further could be that the observed local directory is processed recursively and subdirectories become sub-notebooks in the joplin structure.
But maybe I'm asking for too much :slight_smile:
Nonetheless thanks for your work

1 Like

Hi, somehow this does not work for me neither on Ubuntu19 nor 18. Running rest_uploader seems to properly monitor and OCR the file but then basically stops. Joplin is up an running, clipping service is up and running and key properly entered. Somehow the procedere seems to stop and does not create any entry in Joplin, what am I missing here?

python3 -m  rest_uploader.cli /home/pete/Downloads
    Launching Application rest_uploader.cli.main
    Language: eng
    Monitoring directory /home/pete/Downloads for files
    created -- /home/pete/Downloads/Unbenannt 2.pdf
    {'id': 'a58afb99dac0488e83c6d668712f96de', 'title': 'Unbenannt 2', 'mime': 'application/pdf', 'filename': 'Unbenannt 2.pdf', 'created_time': 1578006467355, 'updated_time': 1578006467355, 'user_created_time': 1578006467355, 'user_updated_time': 1578006467355, 'file_extension': 'pdf', 'encryption_cipher_text': '', 'encryption_applied': 0, 'encryption_blob_encrypted': 0, 'size': 8014, 'type_': 4}
    <Response [200]>
    {"title":"Unbenannt 2","body":"Unbenannt 2.pdf uploaded from pete-VirtualBox\n[Unbenannt 2.pdf](:/a58afb99dac0488e83c6d668712f96de)\n<!---\n\n\n***PAGE 1 of 1*** \n\nTest 1234345rtty\n-->\n\n\n![44aeef3f02d4591b2aab77b543325913.png](:/e4e7ebcae27b4103a3904c9e5f49b418)\n\n","parent_id":"0","markup_language":1,"updated_time":1578006469089,"created_time":1578006468867,"source":"joplin-desktop","source_application":"net.cozic.joplin-desktop","id":"3345fc6bffaf45479b2636f24800d1fa","user_updated_time":1578006469089,"user_created_time":1578006468867,"type_":1}
    {'title': 'Unbenannt 2', 'body': 'Unbenannt 2.pdf uploaded from pete-VirtualBox\n[Unbenannt 2.pdf](:/a58afb99dac0488e83c6d668712f96de)\n<!---\n\n\n***PAGE 1 of 1*** \n\nTest 1234345rtty\n-->\n\n\n![44aeef3f02d4591b2aab77b543325913.png](:/e4e7ebcae27b4103a3904c9e5f49b418)\n\n', 'parent_id': '0', 'markup_language': 1, 'updated_time': 1578006469089, 'created_time': 1578006468867, 'source': 'joplin-desktop', 'source_application': 'net.cozic.joplin-desktop', 'id': '3345fc6bffaf45479b2636f24800d1fa', 'user_updated_time': 1578006469089, 'user_created_time': 1578006468867, 'type_': 1}

I don’t see an error message in the output - the default behavior is for newly created notes to go into a folder called “inbox”, and in the absence of an inbox folder, the results can be unpredictable - is it possible the note was created in a folder where you don’t expect it to be?

Great thanks, actually that was the issue, I should have created the inbox folder, without it the end up nowhere…

Very cool, this was the missing piece for me to finally get rid of evernote, so thanks for your great work!

1 Like

Excellent! I should document the notebook thing a little better / put it in the cli output or something.

Great to hear your feedback, thanks!

1 Like

I just tried to set get a headless setup going and was able to do so using @steve28’s approach. However, when I set the monitoring path to the mountpoint of a webdav mount the script failed to detect new files when I added them on the webdav server. Adding them locally (directly to the mountpoint) worked just fine. Not sure what’s causing this.

Yeah, I had that same problem. I was able to get it working with an NFS mount, however. For me it was an SMB mount that wouldn’t detect new files.

@Shamp0o @steve28 watchdog is used to monitor changes on the filesystem. Depending on the OS and the type of the mount point, it’s possible that it can fail to detect changes. But these are problems that have to be fixed in the watchdog code.

e.g. on Linux one usually monitors the change of the inode data. However, (faulty) implementations of nfs/smb might not propagate that info on the fs level or certain components might not even have support for that info at all.

I see. Well, I guess it’s time to look into other solutions then.

@steve28 did you change any settings in your terminal joplin app to automatically sync whenever there’s a change? I have to manually sync in order to make notes created by the rest uploader appear. Running a regular instance of joplin alongside the server like you described in your post doesn’t work for me.

update

I released a new version. It adds automatic tagging functionality based on OCR/body text.

Changes:

  • Ignore .part files (caused app crashes on KDE systems)
  • Added logic to automatically assign tags to newly created notes
  • Added click option to turn off automatic tag generation

The tag matching logic is not overly sophisticated, so if you don’t like it, just turn it off using the command line option --autotag=no

Thanks!

2 Likes