I was experimenting with the database to see how long different search queries take and got some surprising results.
Joplin uses FTS4. So, in theory, the search should be very fast.
So I tested it out on a database with 10294 notes.
select count(notes.id) from notes;
10294
SELECT
notes_fts.id,
notes_fts.title,
offsets(notes_fts) AS offsets
FROM notes_fts
WHERE notes_fts MATCH "energy"
Result: 2363 rows returned in 265ms
265ms is quite fast when compared to a normal query using LIKE
.
SELECT
notes.title,
notes.user_updated_time,
notes.is_todo,
notes.todo_completed,
notes.parent_id
FROM notes
WHERE notes.body LIKE "%energy%"
OR notes.title LIKE "%energy%";
Result: 2435 rows returned in 2384ms
As expected FTS4, by itself, is much faster (almost 10 times in this case), than a normal search.
The problem, however, is that Joplin has a LEFT JOIN
in its search query.
And this doesn’t seem to scale well.
SELECT
notes_fts.id,
notes_fts.title,
offsets(notes_fts) AS offsets,
notes.title,
notes.user_updated_time,
notes.is_todo,
notes.todo_completed,
notes.parent_id
FROM notes_fts
LEFT JOIN notes ON notes_fts.id = notes.id
WHERE notes_fts MATCH "energy";
Result: 2363 rows returned in 3903ms
Notice the almost 4000ms delay. This is even worse than doing a normal query which took about 2400ms!
Isn’t the LEFT JOIN
a bottleneck? It’s not only negating any improvements from using FTS but also actively hampering performance. Can’t we get rid of it by keeping the needed fields in notes_fts itself as nonindexed fields and adding triggers to keep it up to date?
Here is my joplindev-desktop folder which I used for this experiment.
Just make sure to clear the cache before running each query.
echo 3 | sudo tee /proc/sys/vm/drop_caches
Update: After implementing the required changes and setting a timer like so
const startTime = Date.now();
const rows = await this.db().selectAll(sql, [query]);
const timeTaken = Date.now() - startTime;
console.log(`Time taken for query '${query}': ${timeTaken}ms`)
I ran some tests. (Empty the cache before starting Joplin)
Old implementation:
Time taken for query 'hour': 1052ms
Rows returned 2295
Time taken for query 'thanks': 806ms
Rows returned 2374
Time taken for query 'please': 721ms
Rows returned 4013
Time taken for query 'regards': 81ms
Rows returned 364
New implementation:
Time taken for query 'hour': 160ms
Rows returned 2295
Time taken for query 'thanks': 147ms
Rows returned 2374
Time taken for query 'please': 179ms
Rows returned 4013
Time taken for query 'regards': 138ms
Rows returned 364
So I think it did some good.
All changes in the pull request: https://github.com/laurent22/joplin/pull/3213