Where are tags stored?

WoJ · 6 May 2020 20:42

There were some discussions in the past about the ability to use a #hashtag as a quick way to include tags in a note (Store tags inside notes). The enthusiasm was limited so I am considering writing a batch processor which would take all my notes, extract the tags and add them to Joplin.

I did not dig into the problem yet but a quick glance over the files did not show an obvious place with the tags (to be frank, it means that they did not wave to me saying that they are there - I really did not look too closely yet).

Is there a place where the tags generation and storage is documented?

laurent · 6 May 2020 21:16

For scripting the best is to use the API https://joplinapp.org/api See in particular the /tags endpoint

WoJ · 6 May 2020 21:50

I saw that possibility but AFAICT the API is only available on “volatile” devices (such as a PC) while I would like the batch to run continously (so either against an API running on my server, or directly on files)

laurent · 6 May 2020 22:20

Not sure if that kind of automation is currently possible. Perhaps the headless server of the CLI client might help you getting there:

server <command>

    Start, stop or check the API server. To specify on which port it should 
    run, set the api.port config variable. Commands are (start|stop|status). 
    This is an experimental feature - use at your own risks! It is recommended 
    that the server runs off its own separate profile so that no two CLI 
    instances access that profile at the same time. Use --profile to specify 
    the profile path.

tessus · 7 May 2020 01:16

One option would be to run the cli server on a headless machine and create an ssh tunnel.

Rob · 7 May 2020 07:00

I would also like to know where the tags are stored, just so I understand my own data security in the event of e.g. data corruption that no-one here can help me with. Indeed, perhaps data corruption at a some future point when Joplin is no longer actively maintained.

roman_r_m · 7 May 2020 07:39

It’s all (pseudo)markdown. If you look in your sync target (Dropbox/WebDAV/whatever) you’ll see a bunch of .md files. Here’s an example:

keep-import

id: 44d208719e844337b2b84d8e55391012
created_time: 2020-02-29T18:09:35.814Z
updated_time: 2020-02-29T18:09:35.814Z
user_created_time: 2020-02-29T18:09:35.814Z
user_updated_time: 2020-02-29T18:09:35.814Z
encryption_cipher_text: 
encryption_applied: 0
is_shared: 0
type_: 5

type_: 5 here means it’s a tag.

Rob · 7 May 2020 08:08

Ah, I see. So each tag has its own .md file containing its string and an id.

Then each note-has-a-tag instance is an .md file of type 6, like this, right?

id: 61d5ac29f8bf4e3ba78988fa0fcbca57
note_id: 89a2a47379494b05a99e8e205fdcbe8a
tag_id: 43ad0205f4044238a85e49a9b0598dbe
created_time: 2020-04-28T10:29:57.171Z
updated_time: 2020-04-28T10:29:57.171Z
user_created_time: 2020-04-28T10:29:57.171Z
user_updated_time: 2020-04-28T10:29:57.171Z
encryption_cipher_text:
encryption_applied: 0
is_shared: 0
type_: 6

laurent · 7 May 2020 08:16

Yes that's correct.

But I'm not sure you need to worry about this. If you want to backup, the best way is to export a JEX archive of all your data.

roman_r_m · 7 May 2020 08:17

Seems so. The range of possible types is defined here:

github.com

laurent22/joplin/blob/8cd26c938050877ca32b39e56d40ece37bc99056/ReactNativeClient/lib/BaseModel.js#L559


	static db() {
		if (!this.db_) throw new Error('Accessing database before it has been initialised');
		return this.db_;
	}


	static isReady() {
		return !!this.db_;
	}
}


BaseModel.typeEnum_ = [['TYPE_NOTE', 1], ['TYPE_FOLDER', 2], ['TYPE_SETTING', 3], ['TYPE_RESOURCE', 4], ['TYPE_TAG', 5], ['TYPE_NOTE_TAG', 6], ['TYPE_SEARCH', 7], ['TYPE_ALARM', 8], ['TYPE_MASTER_KEY', 9], ['TYPE_ITEM_CHANGE', 10], ['TYPE_NOTE_RESOURCE', 11], ['TYPE_RESOURCE_LOCAL_STATE', 12], ['TYPE_REVISION', 13], ['TYPE_MIGRATION', 14], ['TYPE_SMART_FILTER', 15]];


for (let i = 0; i < BaseModel.typeEnum_.length; i++) {
	const e = BaseModel.typeEnum_[i];
	BaseModel[e[0]] = e[1];
}


BaseModel.db_ = null;
BaseModel.dispatch = function() {};
BaseModel.saveMutexes_ = {};

WoJ · 7 May 2020 08:42

Thanks a lot for that - this is exactly what I was looking for.
It seems that there are as many type 6 .md files as there are combinations of tags and notes.

Time to code the batch file and break all my notes

Thanks everyone for the help!

(I will also have a look at the headless API as it may be more robust in the long term)

laurent · 7 May 2020 10:50

Yes definitely. Sync is hard and if you directly modify the files on your sync target, you might make a mistake sooner or later, and you won't even realise it till things start to be a bit off in your local files. Like getting strange conflicts, or notes not being synced as you'd expect.

I think using the API might even be easier because you get back JSON objects that are easier to deal with, and more importantly it changes data in a way that doesn't break sync.

Rob · 8 May 2020 07:03

I'm interested in my notes for the long haul, so I want to know I have maximal control - with or without a working Joplin instance. And a JEX archive is just the sync archive tar'd, right, so you still need a working Joplin to interpret it?

roman_r_m · 8 May 2020 07:14

If you don’t use encryption then the content of your notes is in plain text and attachments are also available (but renamed) inside a jex archive.

laurent · 8 May 2020 09:45

Yes you would. But even if both CLI and desktop apps stop working some day, in worse case scenario you could still use any generic JS engine and run Joplin core (which doesn't rely on an particular GUI lib) to unserialise your files. The code to unserialise is relatively straightforward and has been ported to PHP too.

WoJ · 8 May 2020 10:06

OK, I am convinced - thank you for following-up on that!

WoJ · 8 May 2020 16:19

Update on my last comment: thanks ~~god~~ Laurent for the API. It is indeed a lifesaver and much, much easier to deal with than the raw files.

2020-05-08T18:14:59+0200 [hash2tag] INFO starting hash2tag
2020-05-08T18:14:59+0200 [hash2tag] DEBUG all current tags: {'smthg': '1bdcbbfc426945558b92a344113bd94b', 'integration': '7b60586d0df447d7bf1dfdf52efdaefe', 'test1': '355300462db840569ecad69f2e1f20e4', 'test2': '1117f9c749a042d399c7f6449b9e42bd', 'testnote1': 'dfee06f5-332e-4997-b25b-8d714f84b8a6', 'taestnote2': '9380fea9-7135-4944-8779-eacf686f20e5', 'world': '8fa57de8-cce6-423a-886a-c42170781730', 'wazaa': 'b96fa788-ea4c-41fa-aef0-b5dd0809a0e1', 'ahashtag': '9b271258-f9a5-49cd-b1a7-6e7c5ca65f9b', 'hello': 'a8ac54b3-a960-493f-bf40-1ec6a27abe00', 'test': 'a84c9f23-0bdf-4985-8282-90522b3259e5'}
2020-05-08T18:14:59+0200 [hash2tag] DEBUG found hashtags newhashtag,testnote1,taestnote2 in note "This is a test note"
2020-05-08T18:14:59+0200 [hash2tag] INFO new tag detected: newhashtag. Assigning id a1b29af3-565a-4180-8ede-be23becf5e43 and adding to tags
2020-05-08T18:14:59+0200 [hash2tag] DEBUG adding newhashtag to note 29947e4e28c74ff0b1a93b17fe543823
2020-05-08T18:14:59+0200 [hash2tag] DEBUG adding testnote1 to note 29947e4e28c74ff0b1a93b17fe543823
(...)

JosPolfliet · 10 May 2020 19:16

Hi WoJ, will you share your script somewhere? Seems useful!

WoJ · 10 May 2020 20:32

@JosPolfliet Yes I will, they will be open sourced.

The intent right now (this may change) is to have

a script to extract #hashtags from the body of a note and to connect to the API (see below) to add them as tags, and to the note
a script to extract tags named tempX and delete notes which are older than X days

The can work standalone, connecting to the API on localhost.

This is not very practical for me do there will also be a docker image to embark the two scripts, an installation of joplin (running as a server) - managed by supervisord.

All of this actually exists but this is the work of a week-end and while it works, it has several points missing:

no configuration, everything is hardcoded (but prepared for a possible configuration)
I copy an existing database.sqlite to avoid retrieving the API from the DB. I think I will end up doing that, though. This also means that there is a configuration step for the API.
there are no tests (maybe someday)

I will have it running for a few days to catch edge cases.

roman_r_m · 10 May 2020 21:06

#hashtag is a header in markdown. It’s fine if you’re not using markdown but might cause issues for others.

Topic		Replies	Views
Question about getting tags from API Support	2	339	17 May 2019
Tags not synchronized on shared notebook (Joplin server) Support	5	790	11 November 2022
Problem synchronizing tags created via the API (bug?) Support	5	563	10 May 2020
Script to create Joplin tags from #tags in the note text? Support	4	613	11 October 2021
Where are the tags gone? Support	5	825	27 May 2020

Where are tags stored?

Related topics