Fix approach for #14540: Duplicate tags during sync - feedback needed

Hi,

Looking into issue #14540 -- duplicate tags created during sync.

Note: This issue was reported as far back as 2019 and marked fixed,
but it has resurfaced in 2026, suggesting the root cause was never
fully addressed.

Root cause

It's a race condition:

  • Client 1 creates "meetings" and syncs
  • Client 2 creates "meetings" before receiving that sync
  • Both end up with different UUIDs for the same tag
  • Both sync, now all devices have two "meetings" tags

A Tag.save() guard alone won't fix this -- Client 2 hasn't received
Client 1's tag yet when it creates its own. Two PRs (#14583, #14598)
have already tried this approach but it doesn't address the core issue.

Proposed fix

A post-sync deduplication pass -- after each sync completes:

  1. Find tags sharing the same title (case-insensitive)
  2. Keep the oldest (lowest created_time) as canonical
  3. Reassign all note_tags pointing to duplicates to the canonical ID
  4. Delete the duplicate tags

No DB migration or schema changes needed. This would also fix
existing users who already have duplicates.

Questions before I open a PR

  1. Is post-sync deduplication the right direction?
  2. Any concerns about re-sync load when merging note_tags?

Thanks

Find tags sharing the same title (case-insensitive)

Now that we have case-sensitive tags, this deduplication shouldn't be case-insensitive

1 Like

Got the point, thanks!

So the deduplication should be case-sensitive -- only merge tags that are exactly identical in title, not just same when lowercased.

"Meetings" and "meetings" should be treated as separate tags if case-sensitive tags are supported.

Do you know when case-sensitive tags were introduced? That would also help understand if existing duplicates in older databases are always exact matches or could differ in case.

Do you know when case-sensitive tags were introduced? That would also help understand if existing duplicates in older databases are always exact matches or could differ in case.

It was added in version 3.5.4, from this PR: Desktop, Mobile: Add support for mixed case tags by mrjo118 · Pull Request #12931 · laurent22/joplin · GitHub

@Kaushalendra-Marcus Did you read my comment here? Tagging a note with an existing tag creates a duplicate tag in list · Issue #14540 · laurent22/joplin · GitHub

We don’t want to be doing post sync deduplication just for the sake of a rare race condition which can be easily fixed manually

Thanks @mrjo118, that's clear.

So to summarize --> the Tag.save() guard is a partial improvement but not a complete fix, post-sync dedup has re-sync cost concerns, and ultimately this may be Laurent's call on whether to fix it at all.

I'll wait for maintainer direction before opening another PR.

Thanks @bwat47! So duplicates before v3.5.4 could differ in case, after that they would be exact matches thanks to @mrjo118's PR #12931.
Good to know for the dedup logic.

Since the issue is that the UUID for the duplicate tags are different, could you fix it by making the generation of that UUID deterministic based on the tag name?

Interesting idea! If UUID is deterministic based on tag name, both clients would generate the same UUID and the conflict wouldn't happen at all. the main concern is what happens if the user rename the tag, UUID would need to change on rename and this could effect the existing notes because their tag references would no longer match. Worth exploring though.

What if you just didn’t update the UUID in that case? So the deterministic UUID is only enforced at tag creation, but not afterwards. That way existing tags don’t have to be updated either.

There would still be the issue of making tag A, renaming it to tag B, then making a new tag A again, which would then try to make the same UUID that tag B now has. But maybe you could seed an RNG with the name and take the first output that results in an unused UUID?

That potentially could work, but it would add complexity to things like renaming tag, and moving between tags, as you would need to match all tags with a matching title for all these operations

Fair point @mrjo118
What about handling this at sync time, when an incoming tag title matches an existing local tag, merge it on the spot rather than saving as new. This would only trigger when a conflict actually occurs, no separate dedup pass needed.

Merging at sync time is still going to create additional changes to sync, because any change you make for incoming tags matching the title need to be synced back down to the other devices

Seems like you’d want to modify the local tags to match the synced tags, otherwise you could get a neverending loop of two ends always syncing the same tag back and forth with their own different preferred UUIDs - and if you do that there should be nothing that needs to be synced back to other devices

@ntczkjfg that makes sense, adopt the incoming tag's UUID locally rather than the other way around. No changes need to sync back, so no loop.
@mrjo118 does that address your concern?

I don’t think there is a guarantee that a new tag locally is going to be processed by the sync before any associated NoteTags, and I don’t think it’s worth adding additional complexity to the sync around this.

FYI someone already has an open PR for this: fix: Prevent duplicate tags via NFC normalization and improved lookup logic by itisrohit · Pull Request #14599 · laurent22/joplin · GitHub

I havn’t looked at the code yet, but that approach looks to normalise tags only when changing a tag association / updating a tag (I haven’t checked if either or both are the case). That seems more acceptable than normalising as soon as duplicate tags are detected upon sync

Looking at #14599, I think the concerns raised by @mrjo118 point to two separate problems worth splitting:
1- NFC normalization on save: this seems uncontroversial and low risk, just ensures new tags are stored consistently.

2- The fallback healing lookup : this is where the merge behavior, orphan risk, and the uppercase/lowercase regression from #12931 come in.

Would it make sense to land just the normalization part first as a minimal safe fix, and handle the healing/dedup logic separately once there's more clarity on the edge cases?

To be honest I didn’t know that NFC was referring to a type of normalisation. It does seem separate from the main issue, but if you’re going to tackle this individually then the PR should be created for a proven issue which has been triaged.

Since #14540 is already open and marked high priority, would a minimal PR that only adds trim().normalize('NFC') in Tag.save() be worth pursuing? No healing logic, no dedup, just consistent normalization on save.

Feel free to raise a separate issue for this, but I doubt it would be considered high priority raised separately, and you should not work on your own issue unless it has been triaged. Also if you do raise an issue for this, you must prove it is an issue with reproduction steps, and not just assert it could be a problem