Week 16: Project Search Engine

Fuzzy search is also working on Windows now.

I also added sorting notes based on edit distance from the search query. This way notes with words that are closer to the original query will go on top.

It seems best to let fuzzy search work without an explicit toggle. I’m still working on a solution to this that requires minimal changes.

1 Like

Thanks for looking into it @naviji. What’s your approach to remove the need for the explicit fuzzy search toggle? Since the fuzzy search results have a score based on their fuzziness, can this be somehow integrated to the Okapi BM 25 logic?

("imprtant" OR "important" OR "importance") AND ("invstmet" OR "investment" OR "investments")

To make a query like this we either need support for enhanced query syntax (for paranthesis) or break the single FTS query into multiple queries (one each for the fuzzy matches of a word), which will be slower.

Unfortunately, the pre-compiled sqlite that we use does not support enhanced query syntax, so we'd either need to provide an sqlite with the required support or split the query.

Edit: Another way would be to filter the result of
"imprtant" OR "important" OR "importance" OR "invstmet" OR "investment" OR "investments",
depending on whether any:1 is present or not, to get notes that contain all the search words. I'm working on this.

Spellfix can return a score that indicates its "fuzziness". (Smaller the better)
https://www.sqlite.org/spellfix1.html#virtual_table_details

I'm sorting the notes by their min fuzziness score as explained in the spec.

And yes, I have integrated this into the existing BM25 logic.

1 Like

Hi @naviji, I had a look at your GSoC contributions to Joplin, thank you very much for your hard work. I was brought here by a comment on this thread as I’m really interested in the tag-based search functionality described there.
I believe I’m right in thinking your search engine will allow searching for tag intersections using something like tag:"first tag" AND tag:"second tag". Would it be within the scope of your project to code in the ctrl-click tag searching that I and at least a few others would so love to see become a part of Joplin?

Thanks, @paulr. You don’t need to put an “AND” between the tags. tag:t1 tag:t2 will work fine.
If you want notes with tag1 OR tag2 try any:1 tag:t1 tag:t2.

There is a pre-release available with all of this working. You can find that here.
The documentation for the new search filters can be found here.

The ctrl-click tag searching feature can’t be added to the project scope since GSoC is over. But it seems to be a simple matter of appending tag:tagname to the search query and triggering a search whenever you click on a tag.

If @laurent agrees, I could work on it in my free time. A better spec might help.

1 Like

Thanks for your reply, I’m excited about the potential of the new search tools, and that there’s a possibility for this extra feature to be added!

To expand, I’m a huge fan of tags for sorting data, and in particular being able to conduct an ‘and’ search of them, because it allows a user to home in on a small subset of appropriately tagged notes very quickly. This is the functionality I’d like to see added to the ctrl/cmd-clicking on a tag, which in the current version of Joplin does nothing.

I think your idea of a ctrl-click triggering a search is great, because it sounds fairly straightforward to implement, and seeing the resulting query in the search box will help users learn about how search queries are constructed. I guess something a little more complex than simply appending the tag search might be needed, to avoid adding redundant terms to the search box. Also, some manipulation of the GUI might be in order, to highlight all the selected search tags in the tag list.

If you need any more thoughts on the spec I’ll be happy to respond.

@naviji, just wondering if there’s likely to be any movement on this? Sorry to hassle!

As far as I know, this feature request hasn’t been given the all clear by Laurent.

I’m hesitant to start working on this without that. (Also, I’ve got a lot on my plate for the next few weeks)
I do like the idea in general.

Edit: I think it’s best to wait for the UI update first.

Thanks for your great work @naviji :+1:
Could you ad all characters (é è ê â...), and space and all symbols like @ # - ... , for at least tag search ?

Edit : symbols work.

Should be fixed in the next pre-release.

Good ! :slightly_smiling_face: