Database speed test results

I was experimenting with the database to see how long different search queries take and got some surprising results.

Joplin uses FTS4. So, in theory, the search should be very fast.
So I tested it out on a database with 10294 notes.

select count(notes.id) from notes;

10294

SELECT
notes_fts.id,
notes_fts.title,
offsets(notes_fts) AS offsets
FROM notes_fts
WHERE notes_fts MATCH "energy"

Result: 2363 rows returned in 265ms

265ms is quite fast when compared to a normal query using LIKE.

SELECT
notes.title,
notes.user_updated_time,
notes.is_todo,
notes.todo_completed,
notes.parent_id
FROM notes
WHERE notes.body LIKE "%energy%"
OR notes.title LIKE "%energy%";

Result: 2435 rows returned in 2384ms

As expected FTS4, by itself, is much faster (almost 10 times in this case), than a normal search.

The problem, however, is that Joplin has a LEFT JOIN in its search query.
And this doesn’t seem to scale well.

SELECT
notes_fts.id,
notes_fts.title,
offsets(notes_fts) AS offsets,
notes.title,
notes.user_updated_time,
notes.is_todo,
notes.todo_completed,
notes.parent_id
FROM notes_fts
LEFT JOIN notes ON notes_fts.id = notes.id
WHERE notes_fts MATCH "energy";

Result: 2363 rows returned in 3903ms

Notice the almost 4000ms delay. This is even worse than doing a normal query which took about 2400ms!

Isn’t the LEFT JOIN a bottleneck? It’s not only negating any improvements from using FTS but also actively hampering performance. Can’t we get rid of it by keeping the needed fields in notes_fts itself as nonindexed fields and adding triggers to keep it up to date?

Here is my joplindev-desktop folder which I used for this experiment.


Just make sure to clear the cache before running each query.
echo 3 | sudo tee /proc/sys/vm/drop_caches

Update: After implementing the required changes and setting a timer like so

const startTime = Date.now();
const rows = await this.db().selectAll(sql, [query]);
const timeTaken = Date.now() - startTime;
console.log(`Time taken for query '${query}': ${timeTaken}ms`)

I ran some tests. (Empty the cache before starting Joplin)

Old implementation:

Time taken for query 'hour': 1052ms
Rows returned 2295

Time taken for query 'thanks': 806ms
Rows returned 2374

Time taken for query 'please': 721ms
Rows returned 4013

Time taken for query 'regards': 81ms
Rows returned 364

New implementation:

Time taken for query 'hour': 160ms
Rows returned 2295

Time taken for query 'thanks': 147ms
Rows returned 2374

Time taken for query 'please': 179ms
Rows returned 4013

Time taken for query 'regards': 138ms
Rows returned 364

So I think it did some good.

All changes in the pull request: https://github.com/laurent22/joplin/pull/3213

1 Like

just to tell others who don't know SQL, what is talked about

2 Likes

Thanks for looking into this @naviji, it’s a good idea to check performance because we don’t want this affected by an update. In fact, have you look at JavaScript benchmarking libraries? Perhaps there’s a case to create a few tests, see how they perform with the current version and then, as you make changes, check that the performance is not changed.

It could be just a few simple tests - searching one word, searching one word with wildcard, searching in title, etc.