Deleting all tags that have only one note

I want to delete all tags that are attached to a single note. I'm thinking of doing it using SQL like

delete from tags t1
where t1.id in (
	select * from tags t
	where (
		select count(*) 
		from note_tags nt 
		where nt.tag_id = t.id
	) <= 1
) cascade;

What are the consequences besides removing them? Will Joplin app still work? Will the synchronization still work? Will it synchronize the changes to other devices? I sync with another desktop and mobile.

Editing the dabase is risky!
Changes do not lead to a sync, because they are bypassing all Joplin processes inkl. sync and updating coresponding tables and can lead to a malefunction of Joplin. Better is to use the data API.

OK, that's good to know! You probably just saved me from a headache :slight_smile:. So can I use Python to do that?

Sure, here a some old python examples

Nice, I can see that those scripts are from 2 yrs ago, and the API probably has changed already but I'll figure it out. Thanks a lot :slight_smile:

No the API is the same, the big API change was in version v1.4.11

If you still decide to use SQL, I think you also need to insert corresponding records into the deleted_items table to make sure tags will get deleted on the server on next sync.

If you still decide to use SQL, I think you also need to insert corresponding records into the deleted_items table to make sure tags will get deleted on the server on next sync.

What would be the item_type though? I guess 4 is a note, but I don't see any dict for it. And what would be the sync_target? I guess item_id is just a tag id.

  1. See https://github.com/laurent22/joplin/blob/64f35d035ee4ca40b8071ea7e0e0d3701112ed85/packages/lib/BaseModel.ts#L15

Depends on what you're syncing to.

Or you can just try deleting a tag and then check the table.

A python solution could be:

import os

from joppy.api import Api

# Create a new Api instance.
api = Api(token=os.getenv("API_TOKEN"))

# Iterate through all tags.
for tag in api.get_all_tags():

    # Find all notes with this tag.
    tagged_notes = api.get_all_notes(tag_id=tag["id"])

    # Delete all tags assigned to one or less notes.
    if len(tagged_notes) <= 1:
        print("Removing tag", tag["title"])
        api.delete_tag(tag["id"])

Prerequisites: pip install joppy and specifying the API token.

I tested it only with a minimal example, so it might contain bugs. And it should be slower than accessing the DB directly.

Unfortunately, it doesn't work because there's no token parameter. I cannot figure out how to use it :frowning:
image

Interesting, the IDE shows the inherited classes rather than the actual signature. It should be class Api(token: str, url: str="http://localhost:41184"). The easiest option would be to hardcode your api token:

from joppy.api import Api
api = Api(token="YOUR_TOKEN")

I like to use an environment variable instead, like in the example. It requires a bit more code, but then you have to specify your token only once for all scripts.

Thanks, now it works!

1 Like

But the number of tags Python wants to delete doesn't match the number of tags I query from the database.

Python wants to delete 946 tags, SQL query says there are 744 tags with <=1 notes and I have 1296 tags in total (total matches in both ways).

and db says I have 3 orphaned tags (queried as

select count(*) from tags t 
where t.id not in (
	select nt.tag_id from note_tags nt 
);

Interesting. I'm not sure where the difference originates, since I never accessed Joplin's DB directly. I think the best way to debug it would be to determine one tag that is in the Python list, but not in the SQL list and manually check which one is correct in this use case.

@Marph I did it. So what happens is I have a lot of duplicated tags, ie. tags with the same name but different ids

image

It seems that 158 tags among the tags that have <=1 notes are duplicates. I found this number by removing duplicates from Python list and from db list and subtracting (674-516 = 158). 744+158 = 902 ≠ 946, so maybe some tags are orphaned, tripled, quadrupled or doubly single-note, idk.

There are only additional tags in the Python list, ie. if a tag is on the db list then it is on Python list. Some of the additional tags have single note, but some of them have multiple notes I checked in Joplin and don't have a doppelgänger tag with the same title. But interestingly there are tags that should be on either list but they are not on the other, which is true for both of them.

1 Like

It seems I forgot to post my solution, here it is: remove duplicated tags from Joplin · GitHub

1 Like