Tags - Lower Case Only?

@whitewall Actually, my position on the matter is Tags - Lower Case Only? - #30 by thebodzio :blush:. Plus, I don't care much about tags being case-sensitive in the sense of case-sensitiveness being their distinguishing factor (e.g. I consider „html” and „HTML” to be the same for processing purposes, but not for the presentational ones).

Of course, it will make things more complicated, but I don't think it will be “extra” :blush:. What I mean is Joplin tag system is already ID-based (which I applaud, as an intelligent design decision), so almost all tag processing is done using IDs not tag names (or “tag titles” as they are called in the source). Of course, tag title editing has to deal with “names”, but that's seemingly the only place where potential changes are necessary.

There is something to it, but on the other hand, why not to give yourself a little bit of extra validation? It doesn't seem like a big deal to retain. Want to have consistently lower-case only tags? Fine! We'll make sure you won't accidentally break that convention. Want to use both lower- and upper-case letters? No problem! We won't check/convert the case for you.

All in all, whenever we search for a tag title, we won't consider letter case. In other words: let Joplin remain case-insensitive in all tag related areas beside tag presentation.

Actually, when I look at the code, all required elements appear to be already in place. I mean: loadByField(fieldName: string, fieldValue: any, options: any = null) already looks for tag title ignoring the case (COLLATE NOCASE). This doesn't have to change at all. One place that appears to have to be changed is in lib/models/Tag.ts, namely at tagTitles[i].trim().toLowerCase(); and save(o: TagEntity, options: any = null) (line 193, same file). Based on the option either convert given tag titles to lower case or don't. That seems to be the only place where they are converted. Let's just call an option to do such conversion or not, “Force lower-case” and… job done? Really, that's all I can see in the code and the two mentioned methods are the only places where toLowerCase() is used in the entire Tag class definition. If I'm mistaken, please, do correct me.

I'm not arguing for anything else :blush:. Perhaps the two types of “case-sensitiveness” make things confusing.

Enough chit-chat! Let's codé :wink:

Looks like another thread being born :blush:.

3 Likes

You want validation? Well here you go!

[Seriously, I agree with everything you are saying. Unfortunately I don't know how to code, only know how to make silly graphics]

4 Likes

:+1::grin::+1:

@thebodzio, it seems you've looked into the issue so if you have some ideas on how to implement your suggestion, feel free to submit a pull request. All I remember is that last time I looked into it, it seems easy but it ended up being very complicated and I gave up, but maybe I looked at it wrong.

2 Likes

I'd love to do it and I would already if it wasn't for the other pre-existing commitments :blush:. Anyway, I'll take a closer look at this issue, time permitting.

I agree completely that it “seems to be easy”, but I'm not so blindly optimistic nor naive to rely completely on that impression :wink:. Once I'll do some testing, I'll be sure to share the results and, hopefully, make a pull request.

1 Like

Being German, having German notes, I want to have tags that are spelled correctly, and case is important in German.

On the matter of ‘industry standards’: OneNote keeps case and does not force lower-case.

4 Likes

I followed this discussion a while, and yes "piefel" is right. I would like to add some addinional thoughts:

In German there is a different meaning between lower case (verbs, adjectivs, adverbs) and upper case (Subjects and Names).

For testing, I migrated a evernote database with 12GB of notes. And I changed all tags to lowercase just to test if I can work with it. This is horrible. e.g. I have an author called "Golden" (an old jewish surname) and I have a color "gold". As a German reader I can see the difference without thinking about the word. If both words are written in lowercase "golden" and "gold" the semantic is gone. Both means the color gold.

This is grammar (lowercase subjects are wrong and hard to read) not estethics. And the "industry standard" (and the "law" of orthography) is "Uppercase words and lower case words are different." ( :slight_smile: In Germany)

So a configurational switch to allow this differences is important if you not a native english speaker. BTW: In the migrated database are more than 4000 tags and more than 300 tags exist with lowercase and Uppercase (aprox. 3700). None of this tags are used double in notes. Cause they mark different things. In my case often names of persons and geographical items.

Please keep that in mind, that we all are bound to our native language and our culture.

In queries it would be ok if a search to gold* delivers "Golden" and "gold". But a real case sensitive search would be nice to have.

Yes, this requirement adds a lot of complexity. Some query systems I work with solve this in this ways:

Implicit switch:

  • search "golden" -> Case insensitive search
  • search "Golden" -> If a uppercase letter is in the query the search is case sensitive

Or having a switch/operator in the query...

2 Likes

Well, not just in Germany. And not just in German, for that matter. As with so many things it depends on context. For decades programmers have had, erm, vigorous disagreements over case sensitivity in programming languages. When it comes to search terms on the other hand most users prefer lower case searches include upper and mixed-case results. And given that case sensitivity substantially complicates search operations, the last time I checked Google didn't even provide a way to specify case in search terms.

Having spent decades as an editor in a past life I can attest to a visceral reaction to having the case of my tags coerced into lower case. For those like myself who have a relatively small number of tags to correct, here's my tedious workaround:

  1. Rename the offending current tag in Joplin. E.g., "utf" to "xutf"
  2. Fire up that Evernote desktop app you never got around to deleting and create a dummy note with the desired tag ("UTF").
  3. Export that note to an Evernote ENEX file.
  4. Import that note to Joplin as ENEX (HTML).
  5. Verify you have a new notebook containing the desired note.
  6. Sync your notes and preen over now having the tag(s) you desired in your tag list. (optional)
  7. Search by clicking on the old tag or via search box (e.g. "tag:x*")
  8. Group select the search results and use the add/remove tag function. Out with the old, in with the new.

This works because Joplin still preserves case for imported Evernote tags. It appears you can use a similar dodge to trick Evernote into including various UTF characters in tags. In that case you would start in Evernote and rename the Evernote tag, say "i2c" by copying and pasting the UTF version "I²C" from your computer's clipboard. When importing your shuttle note Joplin will internally sigh and say "Fine, be that way" and then give you your desired tag.

Special type of 'legacy-software': Evernote as front-end for tag editing in Joplin. :flushed:

Understandable tho ...

Hi all,
In French, too, the distinction between lower and upper case makes sense.

Some examples

The first name "Pierre" for Peter and the word "pierre", which refers to the mineral.

The word "église" which is the building and "Église" which is the religious institution.

"État" for the "State" or "état" for the condition of something.

Joplin accepts capitalization in tags without a problem. I have changed some of them directly in the database, without any negative consequences.

Could the "Tagging" plugin or a new plugin take on the role of fine-tuning tags?

If this works with enex files it should also work with a "hardcore" SQL script or on API level.

I can live with such a solution/workaround. Maybe I'll write a plugin for such a "Tag Escape Char" Replacer. As a workaround, better than nothing. One of the reasons I'm moving to Joplin is going away from Evernote. So I don't want to walk along the enex path :slightly_smiling_face:

Maybe the API allows something like that. Hope I'll find some spare time for some experiments.

I started this thread two years ago and it is still going!!

I am not a programmer but regardless I have tried to think of a method for this that, to my simple mind, keeps close to what Joplin is already doing.

A tag record (type_: 5) has this format

joplin

id: 98233abcd5a44aeb80c9a094299d339a
created_time: 2019-04-19T14:24:17.751Z
updated_time: 2019-04-19T14:24:17.751Z
user_created_time: 2019-04-19T14:24:17.751Z
user_updated_time: 2019-04-19T14:24:17.751Z
encryption_cipher_text: 
encryption_applied: 0
is_shared: 0
parent_id: 
type_: 5

Could it work where a tag record has two entries?

joplin
Joplin

id: 98233abcd5a44aeb80c9a094299d339a
created_time: 2019-04-19T14:24:17.751Z
updated_time: 2019-04-19T14:24:17.751Z
user_created_time: 2019-04-19T14:24:17.751Z
user_updated_time: 2019-04-19T14:24:17.751Z
encryption_cipher_text: 
encryption_applied: 0
is_shared: 0
parent_id: 
type_: 5

When a user creates or edits a tag the text typed is converted and stored in line 1 as all lower case (as Joplin does now?). However the text, as typed, is stored in line 2. Any searching / sorting / duplicate checking is handled using line 1 (as I assume it is now). However the displayed text is line 2...

Of course this is easy for me to dream up but it may still be a nightmare to try to implement. It's just that someone has ressurected this post and, two years later, I am still not used to lower case only tags :slight_smile:

Well, an enex export is just a text file after all. So once you've created a dummy Evernote ENEX export file even non-programmers can dispense with Evernote entirely and just edit the file to change or add <tag></tag> pairs as you will. I just created a ™ tag as a test using Notepad++, for example. It's a kludge approach but it seems to indicate a plugin fix is feasible. (Says the person volunteering someone else to do the necessary work.)

1 Like

I don't think this modification is really necessary. As far as I can remember, Joplin lower-cases all entered tag names (either new or pre-existing), but after that (e.g. in searches) it simply does not care about the case. That being so, it doesn't matter if the tag name is stored in all-lower case or using any other arrangement.

So, in my opinion, remove forced lower-caseing and we should be golden. I can't be certain how many times this case conversion is used in the code, but I believe it's a fairly low-number. A couple of times perhaps.

Seems odd that the modification could be so simple when @laurent previously stated:

Maybe it'll happen one day. One can dream...


"Carthago delenda est"

May be… or not :blush:. Right now it is only a speculation either way :blush:

Schrödingers tags :wink:

1 Like

Cat's love 'em! :cat:‍:bust_in_silhouette:

Ok, now I was sort of mean. Most of my tags are mixed case because of importing from Evernote. And I want it that way. Therefore a view minutes ago I just shut down Joplin, started SQLiteAdmin, opened "C:\Users<user>.config\joplin-desktop\database.sqlite" and then edited the "tags" table by hand. There where only 20 or so entries to change, what I did.

Then I restarted Joplin and now my tags are like I want them to be. :crazy_face:

Does this change propagate to all clients or would this have to be done for each client?