Edit: The autovacuuming is finally complete and the server is back to normal. Sorry about the disruption! The main takeaway is that deleting a lot of data can have an aftershock effect on the database that's difficult to control. Next time I'll simply slowly delete the data over time, which will allow the autovaccum process to keep up.
As mentioned on Twitter, a maintenance operation is in progress in Joplin Cloud and in the meantime the server will not perform at full capacity.
What happened is that over the past several days I've migrated all the items (notes, resources, etc.) to S3 - that was a massive migration with over 200 GB of data and millions of items, but it all went very well without having to stop the service, and I don't think anyone even noticed.
So that went well, and today I thought I'll do the easy part which is to delete the now duplicate items (since they are on S3) from the database. Initially it went well but eventually it started slowing down, and even after stopping the deletion and restarting the service, the database was still working hard at 100% CPU.
What I've then discovered is that the Postgres autovaccuum process couldn't keep up with the deletions and as a result has been working hard on the database for the past few hours.
I've now stopped deleting items so that Postgres can finish vaccuming, but it's taking a long time. I estimate it's about 40% done now, so it may take up to 2-3 more hours before it's fully done. In the meantime the service will unfortunately be slower and certain requests might fail - although despite the occasional failures, no data is being lost of course and your applications will simply retry later if sync fails.
I didn't expect that. I thought the hard part was done but that autovaccuming thing caught me by surprise and there's no easy way to stop it. So it's best to just let it finish and I'll investigate later how to delete the remaining data without this issue happening again.
I'll provide an update later on. Again apologies for any inconvenience!