Improving the sync process in Joplin

(will be posted on Patreon too)

The latest version of Joplin includes a mechanism to upgrade the structure of a sync target. When you startup the app you will be asked to upgrade before being able to sync. Once you start the process, the app will briefly display an information screen, upgrade the sync target, and then restart the app. You’ll then be able to sync with the new sync target format. That first upgrade is quite simple as the goal for now is to put the mechanism in place and verify that it works well.

From a user perspective this feature doesn’t do anything visible, although it caused some issues, so one might wonder why it’s even there. This post is meant to clarify this.

The structure of the sync target hasn’t really changed since the day Joplin was released. It works well however it has some shortcomings that should be fixed eventually for sync to remain performant.

There are also various improvements that could be made but were not previously possible due to the lack of an upgrade mechanism. I have listed below the 5 main limitations or issues with the current sync process and how they could be fixed:

No upper limit on the number of items

Joplin’s UI works well even with millions of notes, however the sync target will keep getting slower and slower as more files are added to it. File systems often have a limit to the number of files they can support in a directory. One user also has reached the limit of 150,000 items on OneDrive.

For now, this is not a big issue because most users don’t have millions of items, but as more web pages are being clipped (clipped pages often contain many small resources and images) and more note revisions are created (one note can have hundreds of revisions), this issue might start affecting more users.

One way to solve this issue would be to split the sync items into multiple directories. For example if we split the main directory into 100 sub-directories, it will be possible to have 15,000,000 OneDrive items instead of 150,000. Another way would be to support note archiving, as described below. How exactly we’ll handle this problem is still to be defined, but there are certainly ways.

Not possible to prioritise downloads

Currently, when syncing, the items are downloaded in a random way. So it might download some notes, then some tags and notebooks, then back to downloading notes, etc. For small sync operations it doesn’t matter, but large ones, like when setting up a new device, it is very inefficient. For example, the app might download hundreds of note revisions or tags, but won’t display anything for a while because it won’t have downloaded notebooks or notes.

A simple improvement would be to group the items by type on the sync target. So all notebook items together, all tags together, etc. Doing so means when syncing we can first download the notebooks, then the notes, which means something will be displayed almost immediately in the app, allowing the user to start using it. Then later less important items like tags or note revisions will be downloaded.

End-to-end encryption is hard to setup

Currently, the encryption settings is a property of the clients. What it means it that when you setup a new client, it doesn’t know whether the other clients use encryption or not. It’s going to guess more or less based on the data on the sync target. You can also force it to use encryption but this has drawbacks and often mean a new master key is going to be created, even though there might already be one on the sync target.

E2EE works well once it’s setup, but doing so can be tricky and possibly confusing - if you didn’t follow this guide to the letter, you might end up with multiple master keys, or sending decrypted notes to an encrypted target.

A way to solve this would be to make the E2EE settings a property of the sync target. Concretely there would be a file that tells if E2EE is enabled or not, and maybe some way to quickly get the master key. It would simplify setting up encryption a lot and make it more secure (because you won’t be able to send non-encrypted notes to an encrypted sync target). When you setup a new client, the client will know immediately if it’s an encrypted target or not and set the client accordingly.

Old notes that never change should be handled differently

It would be more efficient to treat old notes differently by allowing the user to “archive” them. An archived note would be read-only. Then one idea could be to group all these archived notes into a ZIP file on the sync target. Doing so means that the initial sync would be much faster (instead of downloading hundred of small files, which is slow, it will download one large file, which is fast). It would also make the structure more scalable - you could keep several years of archived notes on the sync target while keeping sync fast and efficient.

The resource directory should be renamed

The folder that contains file attachments is named “.resources” on the sync target. This causes troubles because certain platforms will hide directories that start with dot “.”, and perhaps they will be excluded from backup or skipped when moved somewhere else. Being able to upgrade the sync target means we can rename this folder to just “resources” instead.

Conclusion

That’s obviously a lot of possible improvements and it won’t be done overnight, but having the sync upgrade mechanism in place means we can start considering these options. Some of these, such as renaming the “resources” dir are simpler and could be done relatively soon. Perhaps other more complex ones will be group within one sync target upgrade to minimise disruption. In any case, I hope this clarifies the reason for this recent sync upgrade and that it gives some ideas of what to expect in the future.

12 Likes