High disk read of joplin-server

Operating system

Linux

Joplin version

2.14.2

What issue do you have?

I'm running joplin server in a docker. It constantly read file at 70~100MB/s, which can be confirmed by htop, iotop and docker stats.

Currently I only found this error

joplin-server  | 09:31:51 0|app  | 2024-09-19 09:31:43: [error] TaskService: Failed running task #12 (Process emails) KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full
. Are you missing a .transacting(trx) call?
joplin-server  | 09:31:51 0|app  |     at Object.queryBuilder (/home/joplin/packages/server/node_modules/knex/lib/knex-builder/make-knex.js:112:26)
joplin-server  | 09:31:51 0|app  |     at createQueryBuilder (/home/joplin/packages/server/node_modules/knex/lib/knex-builder/make-knex.js:320:26)
joplin-server  | 09:31:51 0|app  |     at TaskStateModel.knex [as db_] (/home/joplin/packages/server/node_modules/knex/lib/knex-builder/make-knex.js:101:12)
joplin-server  | 09:31:51 0|app  |     at TaskStateModel.<anonymous> (/home/joplin/packages/server/src/models/TaskStateModel.ts:16:15)
joplin-server  | 09:31:51 0|app  |     at Generator.next (<anonymous>)
joplin-server  | 09:31:51 0|app  |     at /home/joplin/packages/server/dist/models/TaskStateModel.js:8:71 {
joplin-server  | 09:31:51 0|app  |   sql: undefined,
joplin-server  | 09:31:51 0|app  |   bindings: undefined,
joplin-server  | 09:31:51 0|app  |   originalStack: 'KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n' +
joplin-server  | 09:31:51 0|app  |     '    at Client_SQLite3.acquireConnection (/home/joplin/packages/server/node_modules/knex/lib/client.js:332:26)\n' +
joplin-server  | 09:31:51 0|app  |     '    at runNextTicks (node:internal/process/task_queues:60:5)\n' +
joplin-server  | 09:31:51 0|app  |     '    at processTimers (node:internal/timers:509:9)\n' +
joplin-server  | 09:31:51 0|app  |     '    at Runner.ensureConnection (/home/joplin/packages/server/node_modules/knex/lib/execution/runner.js:305:28)\n' +
joplin-server  | 09:31:51 0|app  |     '    at Runner.run (/home/joplin/packages/server/node_modules/knex/lib/execution/runner.js:30:19)'
joplin-server  | 09:31:51 0|app  | }

And later I saw this consistently in the log: 2024-09-19 10:30:00: [warn] TaskService: Tried to start #12 (Process emails) but it was already running

Note: I saw a similar error/exception for #11 (Process shared items), but followup runs were successful.

I also verified that there are enough free memory, so this is probaby not about swap.

I'm still investigating, and I'll try v3 later.

Questions:

  • Is this a known issue?
  • What is the easist way of disabling task #12?
  • How can I check which file is being read? strace showed something about sqlite journals, but I think it's for #11

About swap, I only verified the system-wide free memory. Perhaps it is possible that joplin server is using all memory within the docker limit? Not sure how I can check that.

I investigated further. It seems to be a general problem of docker. I'm closing it for now.
Continuing invetigation.

I observed that task #11 is constantly failing:

joplin-server  | 08:31:16 0|app    | 2024-09-20 08:31:08: [error] TaskService: Failed running task #11 (Process shared items) KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is proba
bly full. Are you missing a .transacting(trx) call?
joplin-server  | 08:31:16 0|app    |     at Object.queryBuilder (/home/joplin/packages/server/node_modules/knex/lib/knex-builder/make-knex.js:112:26)
joplin-server  | 08:31:16 0|app    |     at createQueryBuilder (/home/joplin/packages/server/node_modules/knex/lib/knex-builder/make-knex.js:320:26)
joplin-server  | 08:31:16 0|app    |     at TaskStateModel.knex [as db_] (/home/joplin/packages/server/node_modules/knex/lib/knex-builder/make-knex.js:101:12)
joplin-server  | 08:31:16 0|app    |     at TaskStateModel.<anonymous> (/home/joplin/packages/server/src/models/TaskStateModel.ts:16:15)
joplin-server  | 08:31:16 0|app    |     at Generator.next (<anonymous>)
joplin-server  | 08:31:16 0|app    |     at /home/joplin/packages/server/dist/models/TaskStateModel.js:8:71 {
joplin-server  | 08:31:16 0|app    |   sql: undefined,
joplin-server  | 08:31:16 0|app    |   bindings: undefined,
joplin-server  | 08:31:16 0|app    |   originalStack: 'KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?\n' +
joplin-server  | 08:31:16 0|app    |     '    at Client_SQLite3.acquireConnection (/home/joplin/packages/server/node_modules/knex/lib/client.js:332:26)\n' +
joplin-server  | 08:31:16 0|app    |     '    at runNextTicks (node:internal/process/task_queues:60:5)\n' +
joplin-server  | 08:31:16 0|app    |     '    at processTimers (node:internal/timers:509:9)\n' +
joplin-server  | 08:31:16 0|app    |     '    at Runner.ensureConnection (/home/joplin/packages/server/node_modules/knex/lib/execution/runner.js:305:28)\n' +
joplin-server  | 08:31:16 0|app    |     '    at Runner.run (/home/joplin/packages/server/node_modules/knex/lib/execution/runner.js:30:19)'
joplin-server  | 08:31:16 0|app    | }

Meanwhlie, the failing task seems to be the same one, because new tasks cannot be scheduled:

joplin-server  | 08:38:07 0|app    | 2024-09-20 08:38:07: [warn] TaskService: Tried to start #11 (Process shared items) but it was already running
joplin-server  | 08:38:18 0|app    | 2024-09-20 08:38:18: [warn] TaskService: Tried to start #11 (Process shared items) but it was already running
joplin-server  | 08:38:28 0|app    | 2024-09-20 08:38:28: [warn] TaskService: Tried to start #11 (Process shared items) but it was already running
joplin-server  | 08:38:38 0|app    | 2024-09-20 08:38:38: [warn] TaskService: Tried to start #11 (Process shared items) but it was already running
joplin-server  | 08:38:48 0|app    | 2024-09-20 08:38:48: [warn] TaskService: Tried to start #11 (Process shared items) but it was already running
joplin-server  | 08:38:58 0|app    | 2024-09-20 08:38:58: [warn] TaskService: Tried to start #11 (Process shared items) but it was already running

[Update] No that's not it. I remove the scheduled tasks and the disk IO is still high.
Now it still seems like it's a general issue of docker itself.

@laurent I have a theory now.

A task maybe scheduled by setInterval, which accumulates if the callback is too slow.

To avoid overlapping tasks, a new run is canceled if the previous one is still running.

However, the task state is loaded from disk, and maybe worse, it seems that it's loaded twice?.

This disk loading part can be slow, e.g. when the system is under high disk IO. In this case we'll spawn a new disk loader (i.e. database connection) every few seconds.

[Speculated, Not fully verified] Later, even if there is no longer external disk IO, the pending database connections themselves may still produce high disk IO, blocking new connections & scheduled tasks.

This matches with my observations:

  • joplin server only has high disk read, not write
  • lsof and strace only shows files about sqlite
  • This high-disk-io issue only happenes when my other matenance scripts (backup, indexing) are running.

To reproduce the issue, I just start multiple processed that read the disk simutaneouly. (I use rclone md5sum /some/dir myself)

I plan to verify my theory by disabling tasks #11 and #12.

The sqlite file itself seems very small. I wonder if it is possible/appropriate to keep it in memory.

Another idea is to replace setInterval with setTimeout, where each run schedules the next one at the end.