About files organization in server side

When I backup my data from OneDrive, I find Joplin use plane files organization for all of json data. I have more than 38000 items. It is very hard to use in local file system, such as grep or any pipe command.

I think that ccache is a good teacher, which use local file system store million level objects for hit build target, it use sub-folder handle them such as /0/{0,1,2,3,4,5,6,7,8,9,}/a.o. For example, create level 1 sub-folder from 00 to 99, and create level 2 sub-folder 00 to 99 under 100 level 1 sub-folders, we will get 100x100=10000 folders, put json files in 10000 sub-folders every 100 items, we will get 100x10000=1000000 items. Item path like this /01/20/14a6b778392b43d69ea23551cc8df744.json.

It is easy handle 100 json files under one folder, and easy switch to different sub-folder. Sync different sub-folder performance should be better check all json files, and it support parallel sync.

I think that is better than plane files organization when you have more than 1000000 items.

Yes directory sharding is the obvious solution. The implementation is the part that's not obvious since migrating all these files on all the sync targets of all users is quite tricky. I'm hoping to get there some day, though I have other smaller sync target migrations planned first.

Yes, Data safe is very importance for end user. Share my idea:

1, create new structure and copy files to new sub-folder follow the rule, which use file name standard such as 14a6b778392b43d69ea23551cc8df744.json which copy to /14/a6/14a6b778392b43d69ea23551cc8df744.json.

2, Joplin check plane file first and them check sub-folder every time, if find new one in sub-folder, use new one and move old one to backup folder one by one. If we got issue, we have a backup copy for restore.

3, when no old files, give end user selection such as download backup files to local or clean.

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.