GSoC idea - Hierarchical Tags

this is the topic in regard to the above-mentioned topic.
Anything on how to do this, how it shall be done, what features in shell include etc. is discussed here if an existing topic hasn’t been created yet, see idea description below.
Your interest in this idea shall be announced here, otherwise, it gets easily lost as we would need to remember each introduction.

This topic is used to update the specification of the idea as well, even if there is an existing topic, so interested students, watch it!
Anything that shell be discussed privately as e.g. if it involves your proposal will be discussed through a private channel what is currently in discussion.

As of the moment, I’m writing this, the idea’s description of https://joplinapp.org/gsoc2020/ideas.html#3-hierarchical-tags is:

One of the most asked-for feature in Joplin is support for hierarchical tags. This would allow users that heavily rely on tags to organise them into a hierarchy, as is done for the notebooks.

Expected Outcome: The tags can be organised into a hierarchy

Difficulty Level: Moderate
Platforms: Desktop, Mobile and Terminal
Skills Required: JavaScript; React; React Native (for mobile)
Potential Mentor(s): laurent22
More info: GitHub issue

6 Likes

Hi! I am interested in working on this feature for GSoC and is looking into it. I am wondering the migration cost and the design for back-compatibility, which I think may be very painful...

  1. How should the user attach a nested tag to a note? Note that we have CLI, so we have to find pure-text way to do that. I think we can only allow alphanumeric characters as tag names in future version, and use xxx/yyy to indicate the hierarchy. If a user already has some tags with / in their names, use // to escape. [^1]
  2. What if a user uploads some nested tags and uses Joplin on another device with lower version? Note that we can't store nested tags in DB as name = xxx/yyy or it will conflict some top-level tags. We can add a column to indicate the parent ID of a tag though. But in old versions the backend is different, and the app may get confused by several tags with the same name. [^2]

Of course, there is always an ultimate solution, replace user's xxx/yyy with xxx_yyy or other patterns on the first launch of new version, and tell users "Hey surprise! We changed some of your tag names! You deserve this for having used strange symbolic tag names!" Not sure if we are gonna do this... :joy:

[^1]: If using \ to escape, the terminal may intercept and escape undesirably, and one needs to write \\ to actually let Joplin escape backslash. Another thought is using xxx.yyy to indicate hierarchy. But using either .. or /. or \\. to escape . itself is painful. // is at least acceptable for LaTeX users.
[^2]: Or we can still use xxx/yyy to store nested tags, and replace all xxx/yyy in the server with escaped version xxx//yyy. Switching to old versions, some tags may be interpreted wrongly, but at least there will be less confusion?

not a problem as we can enforce to update all apps before the feature can be used. Migration guide might be necessary

may you look at photo management applications, for them is a common thing.
I use digiKam, what knows hierarchical tags, what stores the tag in database but it has to interpret the tags, written in the file's meta data,
May related discussion in the mailing list
http://digikam.1695700.n4.nabble.com/template/NamlServlet.jtp?macro=search_page&node=1695700&query=hierarchy&n=1695700
can help.

2 Likes

Thanks for the explanation! I will check it out.

After observing the behavior of digiKam and the structure of its database. I think there are some UX improvements we can have on top of digiKam.

  1. When attaching an existing nested tag to a picture, its parents are not attached to it automatically. It will be more logical to do so automatically.
  2. Similarly when we remove a tag from a picture, we should recursively remove its children as well.

The implementation and UI/UX of digiKam's hierarchical tag feature is quite similar to my assumption above though haha😆!

By the way, I notice your GSoC instructions say

Come up with project that you're interested in and discuss it in Development category
Write a first draft and get someone to review it
Remember: you must link to work such as commits in your proposal. A private place will be created wihtinn the forum for that purposes.

And I wonder if I can start writing a proposal now (and I'd love to) or I need to settle my commit to the good-first-issue and wait until it's merged before starting the proposal thing?

It’s preferred to have at least 1 PR accepted regardless of when it happens so the dev team can see what you’re capable of, but, if you have a proposal you’d like to run, send a PM with both Pack (above) and I tagged in it, and we’ll go from there.

as bed says a PR is required but you can start drafting right now via PM.

The https://developers.google.com/open-source/gsoc/timeline would allow us to give you another week after the application deadline to finalize a PR but still knowing you is qualifying criteria to be selected.

1 Like

@bedwardly-down @PackElend Got it. Thank you!

I am a little bit confused by this description, so I want to provide an idea if what I would expect how the feature looks like:

  • In general I would expect similar behavior to the nested folder function
  • there should be no difference between adding a tag or a nested tag
  • to avoid confusion I would expect to only see the last level taged tag at the top of the note (for example from @packelend list: if I tag a note with city I would expect there is city visible within the note but not state, country and places)

I hope this is somewhat helpful :wink:

may this helps in regard of nested tags, what quite a common example for photos

- animal
  - cat
  - dog 
  - ...
- people
  - you 
  - me
  - ...
- plants
   - tree
   - bush
   - seeds
   - ... 
- places 
  - country 
    -state
      -city 

Maybe we should also think about if we need folders if we have nested tags, because its the same logic (maybe). Just an idea - don't no if it's worth to discuss :slight_smile:

folders are tight tags are flexible

  • You put a note in folder but not the same note in different folders.
  • You can tags multiple notes with the same tag

of course you can create a folder structure based on tags. It similar what happens here in the forum.
You have category and sub-category, anything else has to be solved by means of tags.
See related discussion on meta

But thats what I tend to say.
Folders are tight I can only aplly one folder to one note.
This way I can organize my notes in notebooks.

But if we implemente hirarchical tags I can create tag-trees which replace my Notebooks.
So I just wanted to ask (as there is no backend magic related to notebooks - as far as I understood) if we still need folders or can we simplify the UI with merging the folder and the tag function.
If there is no interest fine - but just wanted to risr this question :wink:

1 Like

don’t get me wrong, I see advantages in tag based structure and wouldn’t mind have it implemented but it would be a parallel thing.
There are many pro and con of flat vs deep hierarchy, you will find a plenty of opinions about this in meta and in the discourse blog.
More important is the fact, that the user are used to the folder structure and we cannot throw over there current structure.
Of course, that can be simulated by tags but, how do distinguish between tags and folder when they have the same name? Details like this have to be respected, so best approach would be to allow a co-existence before we merge in later stage.

1 Like

Hi @nr458h @PackElend Sorry for the extremely late reply! Our uni announced online course and I was busy returning back home in China. In my previous description, I expect the following behavior:

Suppose we have the following tags:

  1. Joplin (top-level)
  2. Productivity/Joplin/User Guide
  3. Productivity/Joplin/Development

And we have notes A-D. I use the term Productivity->Joplin to indicate this single tag named Joplin in the database with its unique ID. I use the term Productivity/Joplin to indicate the tag hierarchy on the UI level that the user will input/see.

  1. Attaching Joplin to A will attach the top-level Joplin label to A. Displayed as Joplin
  2. Attaching Productivity/Joplin to B will attach Productivity and Productivity->Joplin to B (2 labels in total). Displayed as Productivity, Productivity/Joplin.
  3. Attaching Productivity/Joplin/Development to C will attach Productivity and Productivity->Joplin and Productivity ->Joplin->Development to C (3 labels in total). Displayed as Productivity, Productivity/Joplin, Productivity/Joplin/Development.
  4. Attach Productivity/Joplin/User Guide to D. Then search by Productivity/Joplin. Will return B, C, D.
  5. Attaching Productivity/Joplin to A afterwards will attach Productivity and Productivity->Joplin to A. Now A has 3 labels, displayed as Joplin, Productivity, Productivity/Joplin.
  6. Removing Productivity/Joplin from C removes both P->Joplin and P->Joplin->Development labels. Now C has only one label Productivity.

This approach allows tags in different hierarchy to have the same name. And this differs from digiKam in the following aspects:

  1. In digiKam, after step 5, A’s tags are displayed as “Joplin”, “Productivity”. “Joplin”. This creates confusion.
  2. In digiKam, step 6 only removes “P->Joplin” but not “P->Joplin->Development”, which is kind of weird.
  3. Similar to 2, digiKam actually has a label management panel, where you can attach any single tag to the photo without attaching its ancestors, or remove any single tag without removing its decedents.

If I get it right, your idea says:

  1. Attaching “Productivity/Joplin/Development” to C will attach only “Productivity->Joplin->Development” to C (1 label in total). Displayed as “Development”.

  2. Following 1, to avoid confusion, we should not allow tags with the same name even though they are in different hierarchy. This makes sense though, because I read the source code afterwards, and notice that we have created an index on tag names in the current database.

  3. Following 1, the parent tags are not attached. But if the tag panel looks like this


    And the user clicks the orange area, intuitively the user will expect C to appear in the list. Hence the software should perform a recursive lookup to get the notes with child tags (e.g. P->J->Dev) of the selected tag (e.g. P->J).

  4. Generalizing 3, when the user search by “Productivity/Joplin”, C should also appear in the search result to preserve consistency.

  5. Since the names are unique, when the user search by “Development”, should C also appear in the result? There is a little confusing.

  6. Following 3, it is invalid for the user to remove “Productivity/Joplin” from C. Also when the user removes “Productivity/Joplin/Development” (actually the user only need to input “remove ‘Development’” since name is unique), C will have no tag left.

Comparing to my previous idea, yours saves space in the database and UI, but requires the backend lib to do more. It also requires user to memorize his/her hierarchy design all the time because the UI lacks hints.

Sorry for writing so long, but I just come up with a merge of our ideas. It is almost based on yours except that:
 Although storing only “P->J->Development” tag, on the UI it should display this tag as “Productivity/Joplin/Developemnt”.
 In this way the user can use the same name in different hierarchy (say Joplin/Dev, Evernote/Dev). We should determine whether a tag already exists by two columns “name” and “parent ID”.
 The user must input “Productivity/Joplin/Development” to remove the tag.
 The user can search for C by “Productivity/Joplin” or “Productivity/Joplin/Development”. The user cannot search for C by “Development”.

Hope this is helpful, and please correct me if my understanding is wrong!

Sincerely,
Zeyu (fhfuih)

1 Like

Also @nr458h’s question on using folders is quite reasonable. I have tried WordPress, Ghost, Hexo and Hugo to write my blog. And it seems they all agree on two taxonomies in parallel: a one-level tag collection and a multi-level category tree.

And this blog article proposes a interesting and recursion-free alternative of storing the hierarchy information maybe we can try this?

reads as you have done some home work :slight_smile:

@fhfuih, please note that there’s an open PR about hierarchical tags: https://github.com/laurent22/joplin/pull/2572 It hasn’t been reviewed yet and I don’t know if it changes something to your proposal but maybe something to consider.

3 Likes

I see… Thanks for noticing me about it. I will try to understand the source code and see how it works.

So since it is a pending PR, if I want to continue working on this topic for my GSoC and there is something I can work on based on this PR, I guess I should modify this PR instead of pushing another one right?

Or if this one is already well-done, maybe I should look for some other ideas as well.

1 Like