Corrupted resources DB table, or Joplin API bug, or incomplete API docs?

huy · 4 October 2021 17:33

I'm trying to run the joplin-blog tool and that author and I are running into [problems with the Joplin's resources Data API endpoint: [Bug]: exception on one of the attachments · Issue #19 · rxliuli/joplin-utils · GitHub . The problem is that the filename_extension is empty for some of my resources.

I took a look at my database and either it's corrupted or it has vestiges of an old DB schema because the fields title, filename, and filename_extension are not consistent:

This is the API output for a non-working attachment:

❯ curl 'http://localhost:41184/resources/7de408d4ec98e0fa8314c6b092156956?fields=id,title,filename,file_extension&token=c8a14932909ae1...54846490'
{"id":"7de408d4ec98e0fa8314c6b092156956","title":"IMG_20130713_134357.026.jpg","filename":"IMG_20130713_134357.026.jpg","file_extension":"","type_":4}%

This is the API output for a working attachment:

❯ curl 'http://localhost:41184/resources/070b6ba4b5124ab89c1565397069e519?fields=id,title,filename,file_extension&token=c8a14932909ae1...54846490'
{"id":"070b6ba4b5124ab89c1565397069e519","title":"IMG_6992.jpeg","filename":"","file_extension":"jpeg","type_":4}%

Environment

Joplin version: 2.4.9
Platform: macOS
OS specifics: 10.15.7

Steps to reproduce

Start using Joplin many years ago
Over time, collect a Resources DB table that seems to have multiple schemas
Try to query the Resources Web Clipper with http://localhost:41184/resources/<id>?fields=id,title,file_extension&token=<token>
Ponder whether the data is corrupted or if there is a deterministic and complete algorithm to determine the filename of the attachment on the filesystem.

Describe what you expected to happen

We're trying to figure out:

is my data corrupted?
if corrupted, how can I fix it?
if not corrupted, is it that the Resources schema follows multiple schemes, and the API should be backwards-compatible, but there's a bug because it's not backwards-compatible?
If the data is not corrupted and there is no bug in the API, is this just a documentation problem? Is there a complete and deterministic algorithm for figuring out the filename of the attachment on the filesystem?

roman_r_m · 4 October 2021 19:47

One way to test would be to pick a few resources that have missing fields, find corresponding notes and see if these notes are rendered correctly.

I see more or less the same in my database. filename is not used anymore, I think.
As for title, I've checked a few notes without titles and all of them were created by clipping pages from browser long time ago. So either this is how webclipper works or there is/was a bug there.

huy · 4 October 2021 21:37

The notes look fine to me.

Well, if you see about the same thing in your DB, then I'm relieved that my DB is probably not corrupted. Thanks for checking @roman_r_m !

I guess this is an API documentation issue then. Clients of the Data API that need to determine the filename of a resource need to mirror exactly what Joplin core does and not just rely on filename_extension which isn't guaranteed to be set for old database rows.

CC: @rxliuli

huy · 5 October 2021 08:06

Yeah, Resources with inconsistent schema; e.g. missing filename_extension field · Issue #5528 · laurent22/joplin · GitHub essentialy confirms that this is a deficiency in the Data API documentation: there's no documented way to determine the filename of a resource without digging through Joplin core source code.

laurent · 5 October 2021 09:25

You're completely missing the point. A resource is just a blob of data with an id and there's no guarantee about any particular metadata being present - we just collect whatever info is available when the resource is created.

Then how a resource is named is application dependent and may or may not be based on that metadata.

roman_r_m · 5 October 2021 09:46

Mime type should always be available and you should be able to derive the file extension from it.

But generally as Laurent said a resource it's necessarily a file you've attached and may not have a name or extension. For example you can take a picture from the mobile app and it'll create a resource.

laurent · 5 October 2021 10:55

The mime type indeed seems to always be defined, although it's not enforced at the application level so I think an empty mime type could happen. In that case, if a name is needed it would be something generic like "data.bin".

rxliuli · 5 October 2021 14:35

This seems to me to be problematic because I will not be able to determine what the extension of the attachment on the file system is. . .

roman_r_m · 5 October 2021 15:11

Why do you need the extension?

rxliuli · 5 October 2021 16:03

Simply put, I need to get the extension of the resource in the following scenarios

Copy resources from the JoplinProfile/resources/ directory
Calculate the link of the attachment resource in the exported note, I must use the correct extension to concatenate the file name, just like joplin itself replaces :/id with /resources/id.file_extension when editing in an external editor
When you open the editor in vscode, you need to use vscode to open the newly created attachment file, and the specific file name is spliced based on the title and extension (I try to be compatible with joplin's rules for creating temporary attachment file names, of course, this is actually regarded as Depends on the internal logic of joplin)

Currently, I plan to write a batch processing function in joplin-batch-web to automatically fill in file_extension according to mime field. @huy

laurent · 5 October 2021 16:04

You probably shouldn't access anything in that folder as it's application internal data. Instead why don't you fetch the resource from the API?

Also see my answer there - you can either get the file extension from the metadata or derive from the mime type.

github.com/laurent22/joplin

Resources with inconsistent schema; e.g. missing filename_extension field

opened 02:42PM - 04 Oct 21 UTC

closed 04:05PM - 04 Oct 21 UTC

huyz

bug

I'm trying to run the [joplin-blog](https://www.npmjs.com/package/joplin-blog) tool and we're running into [problems with the resources Web Clipper API endpoint](https://github.com/rxliuli/joplin-utils/issues/19#issuecomment-933471484) returning empty `filename_extension` fields. I took a look at my database and either it's corrupted or it has vestiges of an old DB schema because the fields `title`, `filename`, and `filename_extension` are not consistent: ![image](https://user-images.githubusercontent.com/128394/135869619-86347e04-bc62-46ab-8478-cefc4e20559b.png)  This is the API output for a non-working attachment: ``` ❯ curl 'http://localhost:41184/resources/7de408d4ec98e0fa8314c6b092156956?fields=id,title,filename,file_extension&token=c8a14932909ae1...54846490' {"id":"7de408d4ec98e0fa8314c6b092156956","title":"IMG_20130713_134357.026.jpg","filename":"IMG_20130713_134357.026.jpg","file_extension":"","type_":4}% ``` This is the API output for a working attachment: ``` ❯ curl 'http://localhost:41184/resources/070b6ba4b5124ab89c1565397069e519?fields=id,title,filename,file_extension&token=c8a14932909ae1...54846490' {"id":"070b6ba4b5124ab89c1565397069e519","title":"IMG_6992.jpeg","filename":"","file_extension":"jpeg","type_":4}% ``` ## Environment Joplin version: 2.4.9 Platform: macOS OS specifics: 10.15.7  ## Steps to reproduce 1. Start using Joplin many years ago 2. Over time, collect a Resources DB table that seems to have multiple schemas 3. Try to query the Resources Web Clipper with `http://localhost:41184/resources/<id>?fields=id,title,file_extension&token=<token>` 4. Ponder whether the data is corrupted or if there is a deterministic and complete algorithm to determine the filename of the attachment on the filesystem.  ## Describe what you expected to happen Would like answers to: 1) is my data corrupted? 2) if corrupted, how can I fix it? 3) if not corrupted, is it that the Resources schema follows multiple schemes, the API should be backwards-compatible, but there's a bug because it's not backwards-compatible? 4) If the data is not corrupted and there is no bug in the API, is there a complete and deterministic algorithm for figuring out the filename of the attachment on the filesystem? ## Logfile

rxliuli · 5 October 2021 16:14

Yes, this decision was mainly based on the consideration that nodejs directly copying files from the file system in batches may be much faster than using http requests, and even so, I still have to manually write the obtained Buffer objects to the file - The file name needs to be calculated manually.

Yes, I can do this, but at the same time as mentioned above, I will create a batch repair program to see the effect.

roman_r_m · 5 October 2021 16:14

If you really need to access the file you can rely on the fact that for every resource there is only 1 file in the resources folder with the name matching the resource id.

This way you can just filter on file name ignoring extension.

rxliuli · 5 October 2021 16:17

This is indeed an effective hack solution (～￣▽￣)～

roman_r_m · 5 October 2021 16:21

You may want to test with encryption enabled, I think it this case there may be not 1 but 2 files for each id, the 2nd one being the encrypted version of the same resource. Should be trivial to filter them out.

system · 4 November 2021 16:21

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Enex import misses out file extentions Support	3	479	13 August 2019
Resources with Dropbox Support	1	1556	13 February 2019
Suggestions after spending a LOT of time debugging webdav sync Features	5	495	2 October 2023
Why are resources regularly broken? Support	4	729	14 July 2022
Why do resources files in sync tree lose their extension? Support	0	372	9 December 2020

Corrupted resources DB table, or Joplin API bug, or incomplete API docs?

Environment

Steps to reproduce

Describe what you expected to happen

Related topics