Improving search with Asian scripts

Sorry, I just tested the original app v1.8.5 and the 'side effects' also exist. So it is not introduced by your modification.

Well yes, these queries would use the default FTS mode, because they only contain English letters, but the notes contain Chinese characters so I guess that's causing problems. I'm not really sure how that could be fixed though.

Does it mean notes that contain Chinese characters are not indexed by FTS?

Hm well now that I think about it, probably that's not the case. According to the sqlite docs, Unicode characters are simply skipped, but the rest of the note should still be indexed. But then I don't know what's causing the issues mentioned by @novelx. I'll look into it later today.

Maybe it's because "English words" is just next to Chinese characters, without spaces? 上面的括号是全角English words. It could be that when normalising the note content we should strip off all non Latin scripts, so that it can be indexed properly by FTS.

I think I've figured out why this is happening: in the example provided by novelx , and were not ASCII characters, but their special fullwidth forms, designed for Asian scripts. The FTS4 docs says:

A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters and all characters with Unicode codepoint values greater than or equal to 128. All other characters are discarded when splitting a document into terms.

Because these special characters have a value greater than 128, they don't break the word and are not discarded like regular parentheses. So 会议:Meeting will be a single word, therefore it does not match meeting. Same for (Asian) and 上面的括号是全角English. Although it seems FTS can sometimes match a part of a word (it matches 全角English words), but not in this case, so I don't really get the rules for that.

Yes, I think that would solve this. But is that possible? FTS tables are automatically generated from the notes table, no?

No they are generated from a notes_normalized table, and this table is populated by the app. There's a normalizeNote_() function that's used to normalize the title and body so I think you'd just need to change this to filter out the non-supported characters.

Ah I see, thanks! I'll give that a try then.