Corrupted resources DB table, or Joplin API bug, or incomplete API docs?

I'm trying to run the joplin-blog tool and that author and I are running into [problems with the Joplin's resources Data API endpoint: [Bug]: exception on one of the attachments · Issue #19 · rxliuli/joplin-utils · GitHub . The problem is that the filename_extension is empty for some of my resources.

I took a look at my database and either it's corrupted or it has vestiges of an old DB schema because the fields title, filename, and filename_extension are not consistent:

This is the API output for a non-working attachment:

❯ curl 'http://localhost:41184/resources/7de408d4ec98e0fa8314c6b092156956?fields=id,title,filename,file_extension&token=c8a14932909ae1...54846490'
{"id":"7de408d4ec98e0fa8314c6b092156956","title":"IMG_20130713_134357.026.jpg","filename":"IMG_20130713_134357.026.jpg","file_extension":"","type_":4}%

This is the API output for a working attachment:

❯ curl 'http://localhost:41184/resources/070b6ba4b5124ab89c1565397069e519?fields=id,title,filename,file_extension&token=c8a14932909ae1...54846490'
{"id":"070b6ba4b5124ab89c1565397069e519","title":"IMG_6992.jpeg","filename":"","file_extension":"jpeg","type_":4}%

Environment

Joplin version: 2.4.9
Platform: macOS
OS specifics: 10.15.7

Steps to reproduce

  1. Start using Joplin many years ago
  2. Over time, collect a Resources DB table that seems to have multiple schemas
  3. Try to query the Resources Web Clipper with http://localhost:41184/resources/<id>?fields=id,title,file_extension&token=<token>
  4. Ponder whether the data is corrupted or if there is a deterministic and complete algorithm to determine the filename of the attachment on the filesystem.

Describe what you expected to happen

We're trying to figure out:

  1. is my data corrupted?
  2. if corrupted, how can I fix it?
  3. if not corrupted, is it that the Resources schema follows multiple schemes, and the API should be backwards-compatible, but there's a bug because it's not backwards-compatible?
  4. If the data is not corrupted and there is no bug in the API, is this just a documentation problem? Is there a complete and deterministic algorithm for figuring out the filename of the attachment on the filesystem?

One way to test would be to pick a few resources that have missing fields, find corresponding notes and see if these notes are rendered correctly.

I see more or less the same in my database. filename is not used anymore, I think.
As for title, I've checked a few notes without titles and all of them were created by clipping pages from browser long time ago. So either this is how webclipper works or there is/was a bug there.

The notes look fine to me.

Well, if you see about the same thing in your DB, then I'm relieved that my DB is probably not corrupted. Thanks for checking @roman_r_m !

I guess this is an API documentation issue then. Clients of the Data API that need to determine the filename of a resource need to mirror exactly what Joplin core does and not just rely on filename_extension which isn't guaranteed to be set for old database rows.

CC: @rxliuli

Yeah, Resources with inconsistent schema; e.g. missing filename_extension field · Issue #5528 · laurent22/joplin · GitHub essentialy confirms that this is a deficiency in the Data API documentation: there's no documented way to determine the filename of a resource without digging through Joplin core source code.

You're completely missing the point. A resource is just a blob of data with an id and there's no guarantee about any particular metadata being present - we just collect whatever info is available when the resource is created.

Then how a resource is named is application dependent and may or may not be based on that metadata.

Mime type should always be available and you should be able to derive the file extension from it.

But generally as Laurent said a resource it's necessarily a file you've attached and may not have a name or extension. For example you can take a picture from the mobile app and it'll create a resource.

The mime type indeed seems to always be defined, although it's not enforced at the application level so I think an empty mime type could happen. In that case, if a name is needed it would be something generic like "data.bin".

This seems to me to be problematic because I will not be able to determine what the extension of the attachment on the file system is. . .

Why do you need the extension?

Simply put, I need to get the extension of the resource in the following scenarios

  1. Copy resources from the JoplinProfile/resources/ directory
  2. Calculate the link of the attachment resource in the exported note, I must use the correct extension to concatenate the file name, just like joplin itself replaces :/id with /resources/id.file_extension when editing in an external editor
  3. When you open the editor in vscode, you need to use vscode to open the newly created attachment file, and the specific file name is spliced ​​based on the title and extension (I try to be compatible with joplin's rules for creating temporary attachment file names, of course, this is actually regarded as Depends on the internal logic of joplin)

Currently, I plan to write a batch processing function in joplin-batch-web to automatically fill in file_extension according to mime field. @huy

You probably shouldn't access anything in that folder as it's application internal data. Instead why don't you fetch the resource from the API?

Also see my answer there - you can either get the file extension from the metadata or derive from the mime type.

Yes, this decision was mainly based on the consideration that nodejs directly copying files from the file system in batches may be much faster than using http requests, and even so, I still have to manually write the obtained Buffer objects to the file - The file name needs to be calculated manually.

Yes, I can do this, but at the same time as mentioned above, I will create a batch repair program to see the effect.

If you really need to access the file you can rely on the fact that for every resource there is only 1 file in the resources folder with the name matching the resource id.

This way you can just filter on file name ignoring extension.

1 Like

This is indeed an effective hack solution (~ ̄▽ ̄)~

You may want to test with encryption enabled, I think it this case there may be not 1 but 2 files for each id, the 2nd one being the encrypted version of the same resource. Should be trivial to filter them out.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.