Week 1: Project Search Engine

There are three parts to this project.
I spent the week mostly learning about the finer details of how to go about implementing them.

Search Filters

I made an intermediate layer (parser) for the search queries. Using some regular expressions, we can now split the search query into its components.
Here is a draft pull request: https://github.com/laurent22/joplin/pull/3213

It’s now able to handle all the existing search syntax. I also made some improvements.
First, there was inconsistency in how the title and body filter worked compared to others.
eg. Let’s say I search for

title:"open source"
This doesn’t work.

So let’s say we try this
title:open source

This might work. But what it’s actually doing is searching for notes that have “open” in the title and “source” in the body or title. If we really wanted the title to contain both “open” and “source”, we had to do this
title:open title:source

I fixed it so that the first search syntax works. Ditto for the body.

I also added a new filter, tags. So you can now do this.
tag:office tag:urgent OR tag:important

If you have any suggestions regarding the syntax in my proposal please let me know.
Project proposal.pdf (84.8 KB)

Ranking function

There has been a change of plans here. Joplin uses Full Text Search 4, while fts5 already has the ranking function I proposed to implement built in. The current plan is to upgrade Joplin to fts5 if possible and use this instead.

Fuzzy search

Spellfix is an extension that can give us the most similar words for a given search term. I currently plan to augment the text search with three similar words that spellfix gives us to do the fuzzy search.

Getting spellfix working on all platforms is the next task. Currently, it works for both Windows and Linux.

10 Likes

This project is extraordinary, fit and complete perfectly my CLI workflow. I’m waiting for updates.

1 Like

Thanks for the update Naveen. I’ll be checking your pull request again soon.

Very nice! Congrats :smiley: