Joplin Cloud temporary performance issues (30 Nov) [Solved]

Edit: The autovacuuming is finally complete and the server is back to normal. Sorry about the disruption! The main takeaway is that deleting a lot of data can have an aftershock effect on the database that's difficult to control. Next time I'll simply slowly delete the data over time, which will allow the autovaccum process to keep up.


As mentioned on Twitter, a maintenance operation is in progress in Joplin Cloud and in the meantime the server will not perform at full capacity.

What happened is that over the past several days I've migrated all the items (notes, resources, etc.) to S3 - that was a massive migration with over 200 GB of data and millions of items, but it all went very well without having to stop the service, and I don't think anyone even noticed.

So that went well, and today I thought I'll do the easy part which is to delete the now duplicate items (since they are on S3) from the database. Initially it went well but eventually it started slowing down, and even after stopping the deletion and restarting the service, the database was still working hard at 100% CPU.

What I've then discovered is that the Postgres autovaccuum process couldn't keep up with the deletions and as a result has been working hard on the database for the past few hours.

I've now stopped deleting items so that Postgres can finish vaccuming, but it's taking a long time. I estimate it's about 40% done now, so it may take up to 2-3 more hours before it's fully done. In the meantime the service will unfortunately be slower and certain requests might fail - although despite the occasional failures, no data is being lost of course and your applications will simply retry later if sync fails.

I didn't expect that. I thought the hard part was done but that autovaccuming thing caught me by surprise and there's no easy way to stop it. So it's best to just let it finish and I'll investigate later how to delete the remaining data without this issue happening again.

I'll provide an update later on. Again apologies for any inconvenience!

4 Likes

Thanks for the update, I've literally just signed up for Joplin Cloud but when I check the sync configuration I get a 504 gateway timeout. Is it related to this? Should I just wait?

You can try again actually, I think you might have been unlucky as I see it's performing relatively well at the moment.

I'm in the same boat as marcoc, just signed up today and having trouble getting synced. I'm currently getting the following:

Completed: 30/11/2021 14:06 (358s)
Last error: Error: Unknown error: <html> <head><title>504 Gateway Time-out</title></head> <body> <center><h1>504 Gateway Time-out</h1></center> <hr><center>nginx/1.18.0 (Ubuntu)</center> </body> </html>

The autovaccuming is finally complete so please give it another try. I'll keep on eye on the server but as far as I can see CPU usage and response time is back to normal.

1 Like

Looks much better now. Thank you

Thank you for the transparency. I was getting failures earlier, but they seem to be fixed.