Home / GitHub Page

Do we need an AST?

Converting the search query to an AST will increase flexibility. But after giving it some thought, I wonder if it may be over-engineering it.

From what I understand, creating an AST in full generality requires a lexer, a parser, and a grammar for the query. Building these from scratch would be difficult, so I would need to depend on a library for this. And even after creating this tree, mapping it into a single SQL query would be hard.

Having boolean connectives (and/or/not) among all filters, tags, and search terms seems both unnecessary and complicated. Considering that not even StackOverflow’s search does this, I’d say it would be better to go for a limited version for now.

I suggest the following:

Limit boolean search to tags, title, and body. So we can do this -
tag:office and tag:important -tag:spam title:"Week" or body:"boring"

But not this:
is:note or is:todo and created:20200119 or updated:20191103

Instead of boolean connectives between the other filters, let’s instead allow multiple filters of the same type.
in:notebook1 in:notebook2 created:day-10 created:20200512 updated:day-5

Thoughts?

This seems like the right thing to do. But I’m not exactly sure what the distinction is between option 1 and 2

Is Option 1
(tag:office and tag:important) -tag:spam (title:"Week" or body:"boring)"

and Option 2
(is:note or is:todo) and (created:20200119 or updated:20191103)

And what if you have a note like
tag:office and tag:important -tag:spam title:"Week" or body:"boring" updated:20191103
Does that work?

IMHO, if you use a well tested library for the AST, it would mean less possible bugs and less testing to do rather than you writing out your own search protocol, because then you’d have to intensely test your code and even then some bugs are likely to remain until the feature is released and crowd-tested. (Non-regular grammars are notoriously difficult to prove correct)

An AST would be perhaps slightly over-engineered a solution, but on the other hand would be a much more stable solution and also mathematically verified. The grammar also would not be very hard to come up with, since a search expression falls under first-order logic.

My personal favourite library for this is Antlr. It is fairly simple to use and has been used very widely (apparently also by Guido van Rossum, the benevolent ex-dictator of Python).

Yes I’ve suggested an AST but you’re right that it might be too much since the search syntax is not too complex.

I don’t think we should support “and”, “or”, etc. - internally it should just be “and” between all the search terms. Then user can use the “-” option to refine the search results.

So if you get rid of and/or, you only have a flat array of token and won’t need an AST.

1 Like

Absolutely, maintainability is one of the greatest advantages of Joplin compared to many other open source projects.
I am glad you value maintainability over complex functionality. :smile:

Yes that would work.

But not this
tag:office and tag:important -tag:spam title:"Week" or body:"boring" or updated:20191103

We allow and/or/not between tags and between title, body, and just plain text but not between the other filters. They are always connected using and.

So something like this:

(Filter type#1) AND (Filter type #2) AND (Filter type #3) ...

All filters of different type are connected by AND internally.

But there can be boolean connectives between filters of the same type.

For example, let’s take the tag filter.
The current implementation already supports boolean search for tags.
tag:tag1 or tag:tag2 and tag:tag3 -tag:tag4

And we can use the existing FTS4 support for and/or/not to allow boolean search for text filters.
title:title1 or body:body1 and "hello world" -title:lazy

But all other filters are connected together using OR between filters of same type.

Let’s take the notebook filter. I think it’s desirable to allow this:
notebook:notebook1 notebook:notebook2 notebook:notebook3

This will be interpreted as
notebook:notebook1 OR notebook:notebook2 OR notebook:notebook3

Likewise for other filters like created, updated.

Automatic inference of boolean connectives depending on context…
Interesting.
I guess will need thorough documentation

I see what you mean, so I guess there should be some implied operators depending on the type? I think that makes sense, so the query:

  • tag:tag1 tag:tag2 will be translated to tag:tag1 AND tag:tag2 and will return all the notes that contain both tag1 and tag2.

  • notebook:nb1 notebook:nb2 will be translated to notebook:nb1 OR notebook:nb2 and will return all notes in nb1 or nb2.

The connection AND/OR is only done at the backend level (user shouldn’t have to type them) and we still don’t need an AST for this as the connection is fixed depending on the terms.

Actually, perhaps you should have a document on the forum with a table that lists all the prefixes we support, with description and some examples? Like Evernote does: https://help.evernote.com/hc/en-us/articles/208313828-How-to-use-Evernote-s-advanced-search-syntax That would make it easier to discuss the search engine and get to a final spec.