GSoC Idea - Search

this is the topic in regard to the above-mentioned topic.
Anything on how to do this, how it shall be done, what features in shell include etc. is discussed here if an existing topic hasn’t been created yet, see idea description below.
Your interest in this idea shall be announced here, otherwise, it gets easily lost as we would need to remember each introduction.

This topic is used to update the specification of the idea as well, even if there is an existing topic, so interested students, watch it!
Anything that shell be discussed privately as e.g. if it involves your proposal will be discussed through a private channel what is currently in discussion.

As of the moment, I’m writing this, the idea’s description of https://joplinapp.org/gsoc2020/ideas.html#7-search is:

The current search engine is built on top of SQLite FTS. An index of the notes is built and this is what is used by FTS when searching.

While it works relatively well, there is still room for improvement. In particular we would like to implement the following:

  • Allow boolean searches - search for "A and B", or "A or B", etc.
  • Remove the need for wildcard queries - for example instead of typing "search*", it will be possible to simply type "search" and results that contain "search" or "searching" will be included. Those that contain the exact match will come first.
  • Search within certain tags (eg. "tag:software search" to search within the notes tagged with "software" and that contain the word "search").
  • Improve relevance algorithm (give a weight to certain criteria, and allow adding new criteria more easily). In particular give more weight to recently modified notes, and less weight to completed to-dos.
  • Allow fuzzy search (for example return results that contain "saerch" for the query "search")

Expected Outcome: To be defined with the student. Depending on what features they would like to implement.

Difficulty Level: Medium
Skills Required: JavaScript
Potential Mentor(s): laurent22
More info: Search engine improvements

5 Likes

any idea has been discussed recently which I want to share here.
It may can be a parallel, means two student works on the topic search.

If you continue thinking about search, it can open doors towards natural language processing , Deep Learning for NLP and Machine Translation.
I found some reads:

The background is, that I would love to see multi-language support, as many of my notes are in English, I German, so I have a good share of German notes and my partner is French, so if we share notes with our son, he has to go through three languages :sweat_smile: .
Solving this, multi-language /synonyms support for tags could be a beginning, easing the situation.

1 Like

have look in here too

Is it necessary that we use SQLite FTS for the search feature? Because I just found another JS library that better fits our use cases here.It supports fuzzy search, and other extended search syntaxes.

As far as I understand, currently the index is built in the SQLite database. But, the library I saw will require the index to be in a JSON format. Is that acceptable in our project?

PS: I didn’t mention the name of the library as it is directly related to my proposal idea. I would be glad to give more details if the discussion is done in private.

see GSoC Live Blog

@Mentors, I cannot answer this, what do you think?

I think that finding out what the library in question is in private would be a good first step. One of the core issues I’m seeing with this project is that at least a small handful of libraries that have been used are either having problems staying maintained or all support has been dropped altogether (look at the share library used for mobile clients). This is a common occurrence in open source but the fact that nodejs has such a massive amount of libraries available for various different features means that it is often multiplied here when any particular one becomes the main one used in projects only to have it go stale and pretty much die, causing a domino effect. My two cents, though.

2 Likes

Yeah, I saw this. But I think Laurent mentioned somewhere that even the first draft of the proposal should be almost complete. However, I cannot come up with a proposal without getting the answers to the questions I asked above. Can I still send a private message to @Mentors about the JS library for fuzzy search?

yes
What Laurent want to avoid is that we are spammed by almost empty drafts. Your indented direction should be clear but me and @bedwardly-down don't have problem to have it evolved. Laurent may will join the party later.

1 Like

You're probably right. But, I have no idea on how to know whether it's a good trade-off to use this library.

have a look here

I know it is probably too late to reach out, but I only found out about the Google Summer of Code yesterday. Even if it is not possible to join the SoC at Joplin at this point of time, I very much look forward to contributing in developing this feature. #gsoc-2020
I am pursuing MS in CS currently at University at Buffalo.

coming up with a solid proposal until the day after tomorrow is quite challenging, isn't it?

we thinking about to apply at SoD too as docs need to be improved.

what year? We will be hopefully part of GSoC 2021 again.
You can also mentor in GSoC 2020 and become a student in 2021

Yeah it would be quite challenging to come up with a proposal by 31st. But I would like to take the rest of the day today and understand what kind of approaches might be helpful for developing the Search feature.
I am currently in my 2nd semester. UB offers a 1.5 year course, so it is unlikely that I will be able to join next year.
I want to be able to contribute to open source projects such as this! So what course of action would you suggest ? Specifically regarding Joplin.
Also, thank you so much for the quick response, even so close to the deadline!

welcome as you are realistic too :slight_smile:
Any kind of contribution, like fixing bugs, is welcome, you may want to mentor the search project if you like and Google grants us enough student slots

the is an excellent first step ahead.

One of the core issues I’m seeing with this project is that at least a small handful of libraries that have been used are either having problems staying maintained or all support has been dropped altogether (look at the share library used for mobile clients).

The share library is the only one that really comes to mind and it was dropped several months ago in favour of manually integrating share.

If you have any libraries that concern you please feel free to share them with the community (ideally make a new thread for this) and hopefully we can come up with some alternatives that aren’t problematic!