Homepage    |    GitHub    |    API    |    Wiki    |    FAQ

Many files? (680 files for 85 notes)

I've started migrating away from OneNote into Joplin recently and so far I have 85 notes saved however I've spotted that the folder I have all mydata saved into has some 680 files (All.md files) that makeup my data, is this normal?

I did look breifly at Obsidian and that seems to save each page/note as a file (which is what I'd expect), it looks like Joplin saves each line/object as a file, is this by design?

Thanks
Doug

That is your sync target. Your local notes are saved to your profile sqlite database (there are no .md files involved).
The sync target is basically just a file based database, it isn't designed to be "used" in any sense other than by Joplin. Do not edit anything inside there as you are very likely to break your sync.
The reason it has so many files is that it deals with note revisions and a bunch of other metadata stuff.

2 Likes

Thanks for replying back to me.

Its such a shame it does this - I so wanted a good replacement for OneNote and this looked like a winner until I saw this, having a folder with thousands of files that, should any one of them become corrupt then it has a knock-on for the whole app.

The above together with the totally crazy notes restore procedure means I think I'll stop using Joplin, what application requires an end user to go digging into to database (and need a DB management tool installing also) just to restore deleted items? its crackers! :frowning_face:

I don't quite agree with some of your points here. No, it isn't perfect but I think there is some lack of understanding here.

To start with you don't need to sync the folder locally unless you are using the local file sync. For example if using NextCloud you don't need to have the Joplin sync target on your own PC, it only has to exist on NextCloud as Joplin accesses it directly, not locally.

Not really a problem as far as I'm aware. (Please somebody correct me if I'm wrong here) Essentially the sync target stores "changes" rather than notes so unless something is manually editing the older revisions (which shouldn't be happening) then nothing will break.
And when I say "break" it manifests itself as a refusal to sync until you fix it, not that you start downloading garbled rubbish and corrupting your local notes.
I've also not seen any evidence that it has ever been corrupted when used properly and even if it can be, it is a relatively simply task to replace the data with the most recent client as it can "push" the local data, replacing everything on the target.

Yes it isn't ideal but

  1. it isn't meant to be a regular task, this is meant to be an "oh god I really messed up" type fix and
  2. there is ongoing work to implement a "trash" option (either natively or as a plugin I don't remember) and ultimately there is no permanent replacement for proper backing up of your data.

No matter how many levels of deletion protection (warnings -> trash -> final warning -> notes abyss) is going to stop somebody deleting things by accident. At least there is a method of getting things back even if it isn't user friendly. However there is no real reason why this couldn't be further developed into the frontend that I can see - it just depends on time and who is willing to put in the effort.

It is the nature of open source software, no it isn't perfect and yes there are features that can and probably should be added but ultimately development is still going on at a pretty rapid pace, tons of new features are being added all the time in the core application as well as via community plugins and third party tools - it is already in a very different place to where it was only a year ago.

5 Likes

I also think this is a bad decision.I have about 3000+ notes, but the sync generated 30,000+ md files of less than 1kb each, so the speed of the synchronization is very slow.The smaller and more the file, the slower the transfer speed´╝łIt is well known that the transfer speed of 4 kb files is much smaller than the transfer speed of ordinary files´╝ë.If reduce the number of files, and increase the size of the file appropriate, so the transfer speed must be improved greatly.

I'm not sure I understand. The number of files relate to how many note revisions you have, I'm not sure how the sync would work if you were to somehow consolidate the files?
Plus the slow sync should only exist for a first client sync, the clients don't need to access the full set of files constantly to maintain their local data

Also this is (partially) why Joplin Server/Cloud was created, to deal with the inefficiencies of transferring lots of small files via protocols not designed for it like webDAV.

I often add some content to many notes´╝îdoes it mean that more and more md files will be generated´╝č

Is it possible to store all revisions of a note in a single file and synchronize the file if it is modified. Although there will be many versions,it can't be very large because it is a text file.

I have also used Obsidian for a while´╝îIt is possible to store a note as a single file and to store many historical versions.

Yes, if you modify a note and have a look in your sync target sorted by date then you should be able to view exactly what Joplin is creating.

The way I tend to think of the sync is like a database transaction log where you provide a full history of all the "stuff" that has happened to a particular note for the other clients to integrate it into their own database.

Not with the current implementation, no. I'm not an expert but wouldn't that be opening up all kinds of problems with file locking or race conditions if you have multiple clients that could be trying to modify the file at once?

I had a quick look at Obsidian's model and it seems that note revisions are only available via Obsidian Sync. So to me that implies that the version history isn't stored on the .md file in any way and that it stores the data in some format detached from the notes themselves within whatever system it uses for sync.

I'm not really sure why the number of files is a problem other than for performing a 'first sync' with a device.

The way I tend to think of the sync is like a database transaction log where you provide a full history of all the "stuff" that has happened to a particular note for the other clients to integrate it into their own database.

I think if merge the log to a file, then mark the date before each modified content, also can view modified records clearly.

Not with the current implementation, no. I'm not an expert but wouldn't that be opening up all kinds of problems with file locking or race conditions if you have multiple clients that could be trying to modify the file at once?

It is very few cases of modifying a note on multiple devices at the same time,.If this happens,can generate a conflict note, let the user choose which one want to keep like evernote.

I had a quick look at Obsidian's model and it seems that note revisions are only available via Obsidian Sync. So to me that implies that the version history isn't stored on the .md file in any way and that it stores the data in some format detached from the notes themselves within whatever system it uses for sync.
I'm not really sure why the number of files is a problem other than for performing a 'first sync' with a device.

I am not sure if the historical version can be synchronized in obsidian´╝îbut I think that the history version is not very important, just restore the history version in a very small situation, I want is the latest version of the note. So is it possible to sync only the latest version of notes and not the historical version.
I don't care how much the files are, but too much small files has seriously affected the sync speed, sometimes even less than 1kb/s.I can not stand it.

If you don't want to keep your note history then you can just turn it off in the app and you won't have all of the revisions stored - however that does affect both the sync target and the client.

I doubt there will be any changes to the sync spec unless someone can properly justify it and provide a draft PR for it, Joplin Server was made to account for the deficiencies in having to deal with slow webDAV syncing.

I have to admit I still don't really fully understand the issue though, how often are you syncing the entire database? Or just how much data are you adding from one client between sync's? If your sync interval is low enough then it should be syncing pretty often and even with lots of changes I don't find the speed to be particularly bad - even with NextCloud's broken implementation of webDAV.

My total data is approximately 3.5G, 60,000 files, and the sync speed is very slow,so I compressed them into a file and then to synchronize,the sync speed has improved greatly.
But it is also slow to extract this compressed file to another folder.then let the Joplin load these data are also very slow. So this is not just sync speed problems, sometimes it affects the smoothness of the software.´╝łEvery file read and write need search for address, so the more files, the slower the read and write speed.´╝ë

What sync type are you using?
I wouldn't expect it to have a significant impact on Joplin itself because all the local data is written to the database and it only has to write and process new files on the sync target.

Currently use syncthing, later experience Joplin Cloud

Ok, that makes a bit more sense then as it is looking at a local folder to write files rather than just connecting to the sync target directly over the internet. I've never tried to use a local file sync so I don't have much of a benchmark for it.

If you are moving to Joplin Cloud then I doubt you would have the same issues.

Yes, I also think so, may even be worse, because the synchronization file is same.
If synchronize the database files directly, must can solve this problem, but the PC database files can not be recognized on phone´╝î can only be recognized by another PC.

I'm looking/using Joplin over 4 devices and I'm in the process of massive amounts of copy/paste/adjust/format/edit etc. each of my some 700+ OneNote notes totalling around 4.6GB.

I have my Joplin data saved/sync'd into PCloud and am using the WebDav connection on each device.

Seeing things like this arent helping with the confidence :frowning:
(I really want to give Joplin a fair run before any decision to can it)

image