Limitations of Searching in Joplin?

Operating system

Windows

Joplin version

3.0.6

Desktop version info

Joplin 3.0.6 (prod, win32)

Client ID: ec934152e8ab441e8d1466e69798f9ff
Sync Version: 3
Profile Version: 47
Keychain Supported: Yes

Revision: 18b9f5c

Backup: 1.4.1

What issue do you have?

I know that there are at least two instances of an unusual word in the text (vs. title) contained in all the notes in my collection, but when I search for that word in the Desktop version, only one of the two notes that contain that word is returned by the Search (in the Android version, no Notes are returned). Is this normal? I had never noticed any problems with previous searches, and I don't understand why this is happening. I would be happy to open a report in GitHub, but I didn't want to duplicate something that's already there.

2 Likes

We've had reports of similar problems before. In the past, one such issue was caused by null characters added by the Web Clipper or pasted into Joplin (related commit).

Do the missing notes appear in search results if the search is prefixed with a /? For example, instead of test, search for /test. (Prefixing with a / uses a different search system).

1 Like

Thank you very much for responding to my question. Searching with '/' as a prefix returned both notes, so fixes my immediate problem.

For the note that didn't appear without that prefix, I recall entering all the Markdown text myself (didn't create it with the Web Clipper), but I may have copied and pasted some of the text from the internet source, because when I examined all the symbols in the Markdown (copied to Notepad++), I found some NBSP scattered around throughout. When I replaced all instances of NBSP with a space, and copied the Markdown text back to Joplin, the normal search returned the previously missing note. UPDATE: But now I have just noticed that the CRLF is also maybe causing the incomplete search results. Are these also considered "null" characters?

There are probably other notes in the collection that have such characters. Would you happen to know if there would be a way to find all of them? I would like to prevent future instances of normal searches not finding things (I am working with users who are not going to understand having to prefix their queries with '/').

I can reproduce this locally — thank you for looking into this!!!

Neither CRLF nor NBSP should be considered NULL characters. For me, NBSP only seems to only break search for words immediately preceded by it (in old versions of Joplin, NULL broke search in all content after the NULL character).

For example, if I write, NBSPsearchqueryA searchqueryB (replacing NBSP with a non-breaking space), a search for searchqueryB works, while a search for searchqueryA does not. I suspect this is related to the tokenizer we're using for SQLite.

1 Like

I've opened a pull request that should fix this for new notes or existing notes when saved: Mobile,Desktop: Fix nonbreaking spaces and CRLF break search for adjacent words by personalizedrefrigerator · Pull Request #10417 · laurent22/joplin · GitHub.

Until that pull request is merged and released, it may be possible to find some of the notes containing a nonbreaking space by pasting a nonbreaking space into the search bar. This, however, may only find nonbreaking spaces at the beginning of words (e.g. in this is a NBSPtest, but not in this is aNBSPtest).

Another option is to find all notes with NBSPs (and maybe CRLF?) using the debug info plugin's search tool. The plugin, however, won't show where the nonbreaking spaces are in the note (just which notes have them).

Wow! Thank you very much. I've installed the plugin, and managed to find a huge list of notes that have NBSP in them (and I can load more, but I will start with the ones I can see). I'm guessing (because I didn't understand it all entirely) that the pull request that you've submitted means that this issue will be fixed in some later release, meaning that normal searching will then work in spite of any (invisible) embedded special characters. I really appreciate your help! Thank you!

With regard to CRLF, LF, CR issues, a fast fix I use is to put the material into a new note in Notepad++ (free Windows text editor), then go in Notepad++'s menus to "Edit" > "EOL Conversion" > "Windows (CR/LF)". If that's already selected, then simply copying the material back out of Notepad++ and pasting it over the same material in the Windows app for Joplin should fix it. If it still doesn't, then in Notepad++, do Ctrl-F (or use the Notepad++ menus to pick "Search" > "Find"), switch to the "Replace" tab; click "Wrap around" and then "Extended ..." toward the bottom. Then in "Find what:" put \n, and in "Replace with:" put \r\n. Click "Replace all". Do it all again but with \n replaced with \r\n. Then do it again and replace \r\r with \r. Then do it again and replace \n\n with \n. This will ultimately replace every line-ending in the material with \r\n (CR/LF) no matter what it was originally (and they could have ended up mixed by copy-pasting from different kinds of sources).

BBEdit for Mac has similar line-ending and search-replace features (but the expected format will be CR a.k.a. \r). And I'm sure something similar can be done in various Linux editors (which will expect LF a.k.a. \n). Cleaning up material in Android or iOS might be more of a challenge.

All that said, ideally Joplin would just treat all whitespace chars as equivalent (and convert multiple to single) for search purposes. Or maybe have a switch to not do that; I might have some particular reason for searching code for the exact string foo\t\t\tbar without matching foo bar or foo\tbar (where \t = tab character). But when dealing with the average text material, it would be unhelpful to treat foo bar, fooNBSPbar, foo\tbar, foo\r\nbar, etc., as different.

Thank you for the Notepad++ tips. I've been using the debug info 2 plugin to find all the notes containing NBSP. So far none of them have also contained CRLF, but if they do, I will refer to your tips. I've been copying the Markdown text of each found note into Notepad++ and sometimes just manually replacing NBSP with a space, or if there are a lot them, I use the Search and Replace All. Then I copy the changed Markdown back into the Joplin note. The plugin has found at least a couple hundred of them, so it's slow going. It would be nice if there would be some kind of a global Search and Replace within Joplin. Hope there will be that someday. It would be useful for, e.g., harmonizing formatting throughout the collection.

I imagine Joplin stores its material in encrypted form. But if there's really a lot of this material to go through, Notepad++ itself has a feature so search-and-replace in an entire directory structure of files, so a solution could be exporting the entire set of Joplin notes to some format that resolves to plain text (XML, HTML, etc.), then cleaning it up with Notepad++, then re-importing it.

That said, the real fix is for Joplin to stop doing unhelpful things in response to NBSP and line endings.

PS: I have not been over every single Joplin plugin. It's possible that one of them has the capacity to do global S&R across all notes. I have "Search & Replace v2.2.0" installed, and it doesn't seem to have that feature, and just works on a per-note basis.

Thank you for sharing additional ideas and mentioning the "Search & Replace v2.2.0" plugin which I might like to have in any case, even if it isn't useful for global replacement of NBSP. I didn't know that Notepad++ could Search&Replace files in an entire directory. I was going to try exporting a few files as Markdown into a directory and using Notepad++ to Search and Replace some files just to see how that works, but first I installed the S&R plugin and found that it works to search for NBSP and replace with a space (without any escape characters). I'm going to do this for a while and see how it goes. It's easier than copying out the markdown to Notepad++, changing it there, and copying it back. Being able to do the S&R within Joplin is easier, even though it's just one Note at a time. Thanks!