Where are tags stored?

There were some discussions in the past about the ability to use a #hashtag as a quick way to include tags in a note (Store tags inside notes). The enthusiasm was limited so I am considering writing a batch processor which would take all my notes, extract the tags and add them to Joplin.

I did not dig into the problem yet but a quick glance over the files did not show an obvious place with the tags (to be frank, it means that they did not wave to me saying that they are there - I really did not look too closely yet).

Is there a place where the tags generation and storage is documented?

For scripting the best is to use the API https://joplinapp.org/api See in particular the /tags endpoint

I saw that possibility but AFAICT the API is only available on “volatile” devices (such as a PC) while I would like the batch to run continously (so either against an API running on my server, or directly on files)

Not sure if that kind of automation is currently possible. Perhaps the headless server of the CLI client might help you getting there:

server <command>

    Start, stop or check the API server. To specify on which port it should 
    run, set the api.port config variable. Commands are (start|stop|status). 
    This is an experimental feature - use at your own risks! It is recommended 
    that the server runs off its own separate profile so that no two CLI 
    instances access that profile at the same time. Use --profile to specify 
    the profile path.

One option would be to run the cli server on a headless machine and create an ssh tunnel.

I would also like to know where the tags are stored, just so I understand my own data security in the event of e.g. data corruption that no-one here can help me with. Indeed, perhaps data corruption at a some future point when Joplin is no longer actively maintained.

It’s all (pseudo)markdown. If you look in your sync target (Dropbox/WebDAV/whatever) you’ll see a bunch of .md files. Here’s an example:

keep-import

id: 44d208719e844337b2b84d8e55391012
created_time: 2020-02-29T18:09:35.814Z
updated_time: 2020-02-29T18:09:35.814Z
user_created_time: 2020-02-29T18:09:35.814Z
user_updated_time: 2020-02-29T18:09:35.814Z
encryption_cipher_text: 
encryption_applied: 0
is_shared: 0
type_: 5

type_: 5 here means it’s a tag.

Ah, I see. So each tag has its own .md file containing its string and an id.

Then each note-has-a-tag instance is an .md file of type 6, like this, right?

id: 61d5ac29f8bf4e3ba78988fa0fcbca57
note_id: 89a2a47379494b05a99e8e205fdcbe8a
tag_id: 43ad0205f4044238a85e49a9b0598dbe
created_time: 2020-04-28T10:29:57.171Z
updated_time: 2020-04-28T10:29:57.171Z
user_created_time: 2020-04-28T10:29:57.171Z
user_updated_time: 2020-04-28T10:29:57.171Z
encryption_cipher_text:
encryption_applied: 0
is_shared: 0
type_: 6

Yes that's correct.

But I'm not sure you need to worry about this. If you want to backup, the best way is to export a JEX archive of all your data.

Seems so. The range of possible types is defined here:

Thanks a lot for that - this is exactly what I was looking for.
It seems that there are as many type 6 .md files as there are combinations of tags and notes.

Time to code the batch file and break all my notes :slight_smile:

Thanks everyone for the help!

(I will also have a look at the headless API as it may be more robust in the long term)

Yes definitely. Sync is hard and if you directly modify the files on your sync target, you might make a mistake sooner or later, and you won't even realise it till things start to be a bit off in your local files. Like getting strange conflicts, or notes not being synced as you'd expect.

I think using the API might even be easier because you get back JSON objects that are easier to deal with, and more importantly it changes data in a way that doesn't break sync.

I'm interested in my notes for the long haul, so I want to know I have maximal control - with or without a working Joplin instance. And a JEX archive is just the sync archive tar'd, right, so you still need a working Joplin to interpret it?

If you don’t use encryption then the content of your notes is in plain text and attachments are also available (but renamed) inside a jex archive.

Yes you would. But even if both CLI and desktop apps stop working some day, in worse case scenario you could still use any generic JS engine and run Joplin core (which doesn't rely on an particular GUI lib) to unserialise your files. The code to unserialise is relatively straightforward and has been ported to PHP too.

OK, I am convinced - thank you for following-up on that!

Update on my last comment: thanks god Laurent for the API. It is indeed a lifesaver and much, much easier to deal with than the raw files.

2020-05-08T18:14:59+0200 [hash2tag] INFO starting hash2tag
2020-05-08T18:14:59+0200 [hash2tag] DEBUG all current tags: {'smthg': '1bdcbbfc426945558b92a344113bd94b', 'integration': '7b60586d0df447d7bf1dfdf52efdaefe', 'test1': '355300462db840569ecad69f2e1f20e4', 'test2': '1117f9c749a042d399c7f6449b9e42bd', 'testnote1': 'dfee06f5-332e-4997-b25b-8d714f84b8a6', 'taestnote2': '9380fea9-7135-4944-8779-eacf686f20e5', 'world': '8fa57de8-cce6-423a-886a-c42170781730', 'wazaa': 'b96fa788-ea4c-41fa-aef0-b5dd0809a0e1', 'ahashtag': '9b271258-f9a5-49cd-b1a7-6e7c5ca65f9b', 'hello': 'a8ac54b3-a960-493f-bf40-1ec6a27abe00', 'test': 'a84c9f23-0bdf-4985-8282-90522b3259e5'}
2020-05-08T18:14:59+0200 [hash2tag] DEBUG found hashtags newhashtag,testnote1,taestnote2 in note "This is a test note"
2020-05-08T18:14:59+0200 [hash2tag] INFO new tag detected: newhashtag. Assigning id a1b29af3-565a-4180-8ede-be23becf5e43 and adding to tags
2020-05-08T18:14:59+0200 [hash2tag] DEBUG adding newhashtag to note 29947e4e28c74ff0b1a93b17fe543823
2020-05-08T18:14:59+0200 [hash2tag] DEBUG adding testnote1 to note 29947e4e28c74ff0b1a93b17fe543823
(...)

Hi WoJ, will you share your script somewhere? Seems useful!

@JosPolfliet Yes I will, they will be open sourced.

The intent right now (this may change) is to have

  • a script to extract #hashtags from the body of a note and to connect to the API (see below) to add them as tags, and to the note
  • a script to extract tags named tempX and delete notes which are older than X days

The can work standalone, connecting to the API on localhost.

This is not very practical for me do there will also be a docker image to embark the two scripts, an installation of joplin (running as a server) - managed by supervisord.

All of this actually exists but this is the work of a week-end and while it works, it has several points missing:

  • no configuration, everything is hardcoded (but prepared for a possible configuration)
  • I copy an existing database.sqlite to avoid retrieving the API from the DB. I think I will end up doing that, though. This also means that there is a configuration step for the API.
  • there are no tests (maybe someday)

I will have it running for a few days to catch edge cases.

#hashtag is a header in markdown. It’s fine if you’re not using markdown but might cause issues for others.