Synchronization Status - error message XXX could not be uploaded: File not found

Operating system

Linux

Joplin version

3.2.12

Desktop version info

Joplin 3.2.12 (prod, linux)

Client ID: 81d315c8a95e44dcb87093767ba51a90
Sync Version: 3
Profile Version: 47
Keychain Supported: No

Revision: d6f1ca4ba

Joplin Batch: 0.2.2
Joplin Disk Usage: 1.3.2
Markdown Prettier: 0.1.0

Sync target

Joplin Server

What issue do you have?

A few months back I noticed the size of the /home/blackfrank/snap/joplin-desktop/current/.config/joplin-desktop/resources had almost doubled (to 15gb)

I ran a dupe check and found the folder contained almost 15k duplicate resource files.

Recently I found time to check further on this.

I extracted a file name listing from my resources folder to txt, and also dumped the notes table to txt, next compared the txt files to ID which files in the resources folder do not appear on this table (i.e. are orphans).

After a tedious exercise, I identified 16k files that were not referenced on the notes table.

I moved these files to a separate folder, (using a bash shell script) with syntax like so
mv -v ~/snap/joplin-desktop/93/.config/joplin-desktop/resources/004ba4d7f2de4dc9b31ac537b331a17e.jpg ~/Downloads/joplin_orphan_cleanup/5_bash_script_orphan_resources/orphans/
Then ran further sanity checks, and finally moved them to my bin.

Finally, in Joplin (under sync options), I set the process to "Re-upload local data to sync target" and ran the sync so that any orphan files are removed from the server

This process seem to have worked well, but I am left with many thousands of sync status messages on my laptop, along the lines of

000147.jpg (64b1d54174a84f7f8c84cb91f5a9e3c1) could not be uploaded: File not found: /home/XXX/snap/joplin-desktop/93/.config/joplin-desktop/resources/64b1d54174a84f7f8c84cb91f5a9e3c1.jpg Ignore Retry

I cannot click this Ignore button 15k times, it takes a few seconds for the stored procedure to update some further underlying database tables, so this would take ages.

I tried both solutions under How to remove error messages at Synchronization Status?

  • The 1st solution did not work for me. When I run this, the local db recreated with all my deleted IDs appended onto the notes table yet again.
  • The 2nd solution fails because my "DB Browser for SQLite" / DBeaver SQL client accepts SQL syntax like this
delete from sync_items where id in (11073, 
11074, 
+++);

but then states NIL rows were impacted, regardless of whether I run this statement with 15k keys specified, or 5.

3 concerns

  1. (support) any suggestions (other than sql) for clearing these sync messages.

  2. (feature) is it feasible to add an "Ignore All" button next to the existing "Retry All" option?

Screenshots

  1. (feature) is there value in adding a "check for orphans" type menu option to Joplin, which can automate all or part of above? The Joplin Batch plugin Joplin Plugins - Joplin Batch by rxliuli is does not explain the logic it uses to identify orphans, never completes its process on my data set, starts showing orphans even after I have clean out all orphans identified in above process and has no bulk delete option for the orphans I still seem to have. I am not confident in trusting this plugin on my production dataset.

I had a quick look in the code and there does look to be an orphaned resource cleaner built into Joplin, which is tied to note history expiry.

If you don't mind losing your note history, you can do this:
Set note history expiry (keep note history for) to 1 day in Joplin a perform a sync. Then at least 24 hours later, run the sync on the same Joplin client, and it should clear all note history and orphaned resources older than 1 day (it may take a long time to sync these deletions if you have a large history). Once complete, you can then change note history expiry to whatever value you want.

It would indeed be helpful though to have a button to immediately delete all orphans and/or note history though, as it seems the only way to clear them requires waiting 24 hours.

1 Like

Thank you, I will try this soon and report back.

Actually one thing I forgot to consider. The resource cleaning is not directly linked to the sync action, so I think after 24 hours you should open the Joplin desktop client and leave it open for about an hour before syncing, to ensure the cleaning task has completed first (there isn't any status indicator for the cleaning). Also sync it one more time after the sync completes, as there are sometimes fetch operations if you trigger it again after a completed sync

Hello, I tested the proposal during the past week. It did not have an effect.

This is what I did

Try 1

  1. I opened Joplin and cancelled the sync that triggers on 1st load
  2. Under Tools / Options / Sync - I set the interval to 12 hours
  3. Under Tools / Options / Note history - I set note history to 1 day (it was set to 90)
  4. I left the PC (with Joplin open) for a day
  5. I manually ran the sync
  6. The messg remain.

Try 2
7. Out of curiosity I exited Joplin, took a copy of the db and ran a SQL statement that returned the row count for all 33 tables in the db.
8. I then reopened Joplin, clicked ignore on the 1st item, waited a few seconds for the screen to update, then clicked ignore on the next item and so on. I did this for the 1st 4 or so records on screen.


9. I exited Joplin, took a copy of the db and ran a SQL statement that returned the row count for all 33 tables in the db.
10. The row counts on all tables (before and after) was identical. (!!)

  1. I next reopened Joplin, opened the sync errors screen, pressed ctrl A top select all text on screen and copied it to a text editor.
  2. I clicked ignore again (as per 8), noting the details of the items that I was ignoring.
  3. I reopened the sync errors screen, pressed ctrl A top select all text on screen and copied it to a 2nd text editor instance.
  4. I could find all the IDs that I ignored in 12 in the 2nd text editor instance.
  5. The ignore button is not actually doing anything?

before clicking ignore - item appeared on line 5
image

after clicking ignore - item appeared on line 16168
image

Try 3
16. I opened Joplin and cancelled the sync that triggers on 1st load
17. Under Tools / Options / Sync - I set the interval to 12 hours
18. Under Tools / Options / Note history - I left note history to 1 day AND unticked "enable note history"
19. ---I will update this messg hopefully within 24 hrs for findings---

I think you need to restore all the resource files which you "moved to a separate folder" before attempting the orphan cleaning process I described, as probably the error you are getting is blocking the process from working correctly.

Also it sounds like the "Re-upload local data to sync target" action did not complete correctly, so the remote target may not have all your data if you were to sync to a new device / profile in the current state, so be sure to make backups of your Joplin data folder (and the resource files which you removed) before you attempt anything further

2-3 weeks back I restored my backup
Throughout this period Joplin on my laptop have been set to keep one day note history only


I have several times successfully synced my laptop, Joplin server and Android devices

When I run a duplicate file check I get 16k files = 6.8GB worth of dupes

Focusing on file ee5cf462e8ef488a9eb0c84f97b79488.pdf
It does not appear in a Joplin search

When I query the db using
select 'ee5cf462e8ef488a9eb0c84f97b79488' as filename, id from notes where body like '% ee5cf462e8ef488a9eb0c84f97b79488%'
I get

So, this file ee5cf462e8ef488a9eb0c84f97b79488.pdf truly is an orphan file.
But Joplin orphan resource processing is not picking it up.

Did you manage to sort out your sync errors then by restoring the backup?

I have a few things now to mention:
-I previously thought that the orphan resources should get automatically cleaned up in line with the note history setting. I think that is probably not the case, but I think the code I saw might only get invoked if using Joplin Server / Cloud as the sync target for the orphan resource cleaning to happen, which is a server side maintenance task rather than a client side task
-Resources are not truly orphaned if the note history still references them, so selecting from the notes table alone will not truly identify orphans
-It was recently mentioned that you can use a Joplin plugin to do orphan resource cleaning. See this thread for details Getting and deleting images / resources

Joplin should store which notes it thinks a resource is linked to (if any). One way to find this information is with the debug tool plugin (the "notes" entry in the "note info" panel for a resource), though it should also be possible to find this information using Joplin's database.

I'm linking to a related forum post: Are image attachments leaked after deleted images/notes? - #2 by personalizedrefriger

To summarize:

1 Like

Hi rjo118 -

  1. Yes, sync errors no longer appear after the backup was restored.
  2. Re server side maintenance - tx, I cannot access the folders on my android phone where the joplin resourecs are written to see if the dupes appear there also.
  3. Re "resources are not truly orphaned " - what other tables should I then check against to confirm if I have orphans or not? I'm trying to reverse engineer the table relationships and its very difficult for an end user. Is there no ERD or other design documentation, or is the code the documentation?
  4. Per above I have concerns re the Joplin Batch plugin Joplin Plugins - Joplin Batch by rxliuli never completes its process on my data set (it hangs partway through) and even if it did complete - it does not explain the logic it uses to identify orphans, starts showing many orphans even after I have clean out all orphans identified in above process,and so on. I am not confident in trusting this plugin on my production dataset.

Thanks, I will check this tool out


Gave the debug tool plugin a spin, and its very cool.
Attached some initial checks I ran
check 1.ods (244.1 KB)

I'm struggling identify how to apply the debug tool plugin apply in my particular case.
If you were in my shoes, what would you do next/what would you do differently?

It's possible to paste resource IDs into the input at the top of the window (for example ee5cf462e8ef488a9eb0c84f97b79488). This gives more information about the resource. One of the rows includes "notes":


It may be possible to use this row to determine which notes Joplin thinks are associated with the resource (if any). Be aware that it may be necessary to click a button, has text "..." button for the linked notes to load.

Thanks for the guidance.

Attached the enriched file
try 2.ods (243.8 KB)

In summary the new details per the notes field supports my prior finding -

  1. there is NO link for the left side (ie that side appears orphaned)
    notes --> Linked notes. Note: Joplin doesn't update this frequently. Restarting Joplin and waiting 30 seconds should force this to update. .... I did restart a few times, waiting well over 30s before requerying....

  2. there is a link for the right side resource,
    notes --> Linked notes. Note: Joplin doesn't update this frequently. Restarting Joplin and waiting 30 seconds should force this to update. 4ff6a46f7566478691b72675b29845fd... this resource was prev ID'd via GUI search

No idea about if there's any such ERD. I'm guessing that if you remove a resource used by note history the sync isn't going to notice those missing links and it would just be an issue when you try to view those history items - but if you're setting note history to 1 day that shouldn't really be an issue anyway.

The key tables I think would be notes, resources and sync_items. Looking at your original post, you mentioned deleting values from sync_items to clear the sync errors, but they get automatically recreated. That would be the case if you delete the actual resource files in your Joplin directory, but don't delete the matching entries in the resources table of the sqlite db of that Joplin client.

I don't know the reason why the orphan cleaning is not working properly for your particular set of data, but I suspect that if you run your script again to delete the duplicate resource files, write some matching sql to delete those items from the resources table, then if you still get sync errors you should be able to delete them by deleting all sync_items where sync_disabled = 1. I think then, they should not get recreated, if you've correctly deleted the items in the resources table as well