Simple Note (JSON) to Joplin (ENEX) - python script

Hi
I’ve written a python script to convert Simple Note export JSON files into ENEX.
Available at https://github.com/rpgd60/simplenote2enex

The ENEX file format generated by the tool is based on EverNote’s ENEX sample

Even if you do not use Simple Note, I would appreciate feedback on the “quality” (from general and a Joplin import point of view) of the generated ENEX format.

A sample ENEX file generated by the script is available at : https://github.com/rpgd60/simplenote2enex/blob/master/all.test1.enex

Thanks for any comments or feedback
Rafa

6 Likes

This is likely user error as I'm pretty new to dealing with Python, but I get this error after trying to run your script from PowerShell on Windows 10:

File "", line 1
simplenote2joplin.py --json-file notes.json --author 'insert name' --create-title --verbose-level 1 > all_notes_converted.enex
^^^^^
SyntaxError: invalid syntax

Is this the complete traceback? File "" looks odd.

I'm not familiar with windows and didn't use the script, but here are a few general ideas. Maybe they can help identifying the issue:

  1. Is your complete command on one line? Should be something like
python simplenote2joplin.py --json-file notes.json --author 'insert name' --create-title --verbose-level 1 > all_notes_converted.enex
  1. Are you using python 3.6 or newer (as mentioned in the readme)? You can check your python version by python -V

As an alternative approach, you could try: Effilicious | Simplenote to Joplin import tutorial

Yep, this was tried on 3.12.0.

And no, that's not the whole traceback; it didn't render properly because I didn't enclose it in pre-formatted markdown text when I copy-pasted.

>>> simplenote2joplin.py --json-file notes.json --author 'insert name' --create-title --verbose-level 1 > all_notes_converted.enex
  File "<stdin>", line 1
    simplenote2joplin.py --json-file notes.json --author 'insert name' --create-title --verbose-level 1 > all_notes_converted.enex
                                     ^^^^^
SyntaxError: invalid syntax
>>>

As for the command on a single line, I've tried that already, it just yields:

python simplenote2joplin.py --json-file notes.json --author 'insert name' --create-title --verbose-level 1 > all_notes_converted.enex

C:\Python312\python.exe: can't open file 'C:\\Users\\username\\simplenote2joplin.py': [Errno 2] No such file or directory

Which is strange, because I've already attempted to incorporate all sorts of different folder and directory pathnames and combinations thereof for the system environment variables, pointing to where the script is and moving it around accordingly to see if one folder is preferable for it than another, but PowerShell seems intent on staying in the current directory (which for me is my user folder) despite having what seem like the correct pathnames added to PATH to forgo specifying exact pathnames.

I've seen that Effilicious post, and will probably settle for what's described therein, just converting them to markdown files, but the goal in trying to run a Python script from pwsh was to attempt acclimation to Python, PowerShell too, really, with something low-key and low-risk.

I have pyenv, so I might try to switch to an older than 3.12 and younger than 3.6 version to see if that changes anything.

Edit I: I might also add, the script's read-me file isn't very clear where exactly the .json file needs to be, if anywhere in particular. I haven't actually looked at its contents, but that's probably shooting myself in the foot if troubleshooting is what needs to be done, haha.

Edit II: As it so happens, when attempting to run the script from the current working PowerShell directory, even moving the .json file into the same folder as the script, the following occurs:

python simplenote2joplin.py --json-file notes.json --author 'insert name' --create-title --verbose-level 1 > all_notes_converted.enex
C:\Users\username\python_scripts\simplenote2joplin.py:147: SyntaxWarning: invalid escape sequence '\A'
  temp_string = re.sub("\A" + pattern, "", temp_string)
C:\Users\username\python_scripts\simplenote2joplin.py:149: SyntaxWarning: invalid escape sequence '\Z'
  temp_string = re.sub(pattern + "\Z", "", temp_string)
Traceback (most recent call last):
  File "C:\Users\username\python_scripts\simplenote2joplin.py", line 367, in <module>
    main(args)
  File "C:\Users\username\python_scripts\simplenote2joplin.py", line 340, in main
    enex_file = sne.process_file()
                ^^^^^^^^^^^^^^^^^^
  File "C:\Users\username\python_scripts\simplenote2joplin.py", line 262, in process_file
    simplenotes = json.load(jfp)
                  ^^^^^^^^^^^^^^
  File "C:\Python312\Lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
                 ^^^^^^^^^
  File "C:\Python312\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 29419: character maps to <undefined>

You can specify the full path after --json-file, but in the simplest case, all files are in the same directory.

That looks like malformed json. I guess we would need to see the original file or some sample file with the same error. Maybe @rafap has an idea?

1 Like

Right, that's what I thought, but it still didn't want to play nice, even with either method.

Well, I suspect it has to do with the contents of more than a few notes containing characters outside of the core ASCII values. Because I'm an amateur language learner/enthusiast, there are plenty of Latin letters with diacritics or kanji and such littered throughout some of them. If I had to guess, I'd say the script returned the UnicodeDecodeError upon hitting the first instance of something outside of UTF-8, whatever's at "position 29419".

Found this comment from "user65839" on Stack Overflow that might help rafap:

Here's a completely wild hypothesis:

Some prior (really broken) system working on this data attempted to write each character as UTF-8, but actually only wrote the last byte of each sequence (maybe it had a weird one-byte-long buffer somewhere). Alternatively, it was in UTF-8 in the past, but somebody viewing it in a different encoding did a search-and-replace to remove bytes 0xE2 0x80 because they clearly "didn't belong" and didn't realize that the remaining "special character" wasn't the one they wanted either.

ASCII, would of course, be passed through as its UTF-8 encoding would be one byte long.

The 'RIGHT SINGLE QUOTATION MARK' (U+2019) ’ is encoded in UTF-8 with bytes 0xE2 0x80 0x99. The places where you have \x99s is what made me go down this path, since the apostrophe before an s would often be translated to a right curly quotation mark in popular word processing software. If only the last byte of the character was saved, you'd just have the 0x99 there.

The 'RIGHT DOUBLE QUOTATION MARK' (U+201D) ” is encoded in UTF-8 with bytes 0xE2 0x80 0x9D. The 0x9D that you have in your text is often at the end of a double-quoted string. And, it's often right next to a regular straight " double-quote. I wonder if somebody had tried to do some sort of prior clean-up pass on the data, and managed to put back in the closing quote, but left the "weird" 0x9D in there.

As I said, it's a wild hypothesis, but if this is a conglomeration of data from a variety of old systems, it's hard to know what exactly may have happened to it. The last byte of UTF-8 was just the closest "normal" English encoding I could find that would have something reasonable in English text and included the bytes you were looking for.

(https://stackoverflow.com/questions/45749093/in-what-8-bit-character-set-is-0x9d-meaningful)

In the meantime, I've since just converted my .txt files to .md ones, so no rush, @rafap!
It's been a while since last you were active on this thread, and there's certainly no fire, but hopefully my use case can contribute to your future troubleshooting/bug hunting, if there is indeed an issue on the backend. Thanks, @Marph. Cheers!

1 Like

Hi,
rafap here. This is really a blast from the past for me. Glad someone still uses this utility.

Could you kindly send me a sanitized json file that triggers the error?
Due to current workload I may not be able to fix it immediately, but I will give you feedback asap.

2 Likes

Warning to future readers of this thread.

To set the expectations right, please see the section in the repo README where I explain that this app does not really convert JSON to full-fledged ENEX, but to some kind of an ENEX envelope that Joplin (sort of) imports.

If I could, I would change the title of this topic to remove the reference to ENEX.

1 Like