Hi,
What is the recommended way to write a bulk-import script, retaining as much metadata as possible (e.g. creation/ edit dates, and tags) please?
As the start of a script to convert 14 years of TiddlyWiki notes into Joplin, I started researching ways to import content. Before starting to write scripts, I’ve done some proof-of-concept testing on ways to get notes into Joplin but have hit a couple of gaps.
Idea 1 - JEX files (complex, but metadata works)
My first thought was to hand-generate JEX (Joplin Export) files, each holding one note and use the client to import the data directly. After some experimentation, the client export feature File -> Export -> JEX
actually generates a tar file containing a note, rather than the note itself.
After much experimentation, I can hand-generate test JEX files that can be successfully imported but the process is a bit painful:
- Create one file per note, with a unique GID and filename = GID.md
-
Remove the usual end of line ‘\n’ from each file (e.g. in vim set binary, set noeol OR
truncate -s -1 file.md
). Error messages caused by importing a file with a ‘normal’ EoL threw me for a quite while! - Create a tar file from the notes (e.g.
tar cvf test_file.jex 1234.md 5678.md
)
This works, but parent_id
is ignored - a new Notebook is created from the tar file name (e.g. ‘test_file’ in the above tar example). Other metadata, such as the note timestamps and author are retained.
Is this how JEX files are supposed to work, or is there an easier way please?
Idea 2 - API JoplinClipperServer (easier, but only basic metadata supported)
Digging about in this forum, there are a few useful scripts which can create notes using the Joplin desktop client and the JSON API.
The addition of an API to the client is an unexpectedly useful feature, without the need to install the command line client or set up multiple synchronisations to the back-end data store.
It was pretty easy to enable the API, and connect to create a new note in a specific Notebook / folder:
# Create a new note in a specific book
curl http://localhost:41184/notes?token=<long API key> --data '{ "parent_id": "<notebook GID>", "title": "Test created note", "body": "# Body text\nLoren ipsum and all that.\n* One\n* Two\n", "created_time": "1546300800", "updated_time": "1546387200", "user_created_time": "1546473600", "user_updated_time": "1546560000", "author": "The Author"}'
My problem is the new note uses default metadata - the timestamps are lost, which is a shame when I have 14 years of notes with lots of context to retain.
After looking at the API source in ClipperServer.js the reason is clear.
Lines 75, 76 set the title and body, but 96, 97 only set source_url and author - timestamps aren’t supported so are ignored. The extra lines to support the four *_time
fields seem simple at first, but I’m not sure how much input validation and format conversion code is needed. Notes seem to use Unix ASCII dates, but the API returns generated Unix epoch time in milliseconds.
Are there any plans to expand the supported API properties please?
I’ve not tried tags yet, but they are another set of metadata I’d like to port in.
I’ve not (yet) tried the joplin command line client after reading it is a separate program from the GUI client and shouldn’t be run against the same data store. I’m on Linux with a NextCloud backend, so as long as only one prog is running at any one time it looks like a single client database might work.
I’m also a bit worried that the command line might cause a lot of string escape problems - escaping all text characters in note text which themselves contain script fragments sounds like a recipe for disaster!
Thanks for any advice you can offer!
James