SQLITE_NOTADB while trying to restore database file

crysthala · 17 May 2024 08:48

Operating system

Linux

Joplin version

2.14.20

Desktop version info

Joplin 2.14.20 (prod, linux)

Client ID: b52f3ca884e14c668cf1436213882cfb
Sync Version: 3
Profile Version: 46
Keychain Supported: No

Revision: cfd98e3

Backup: 1.4.0

What issue do you have?

I updated my system in March. Joplin had no issues at that time, everything was accounted for. My update involved reformatting the hard drive my /home directory lives on. I had Joplin export all notebooks to a .jex before I did this, only to find a couple days ago that a bunch of stuff is missing. I recovered the original database.sqlite from the hard drive with R-Linux, which didn't say there were indications of damage. I then copied that original file into the "development mode" and tried to open it, hoping to restore my stuff. Joplin instead opens to a white screen and throws an error: "Error: Error: SQLITE_NOTADB: file is not a database: SELECT * FROM version LIMIT 1..." (It goes on, full error is in a log file below.)

The terminal I used to launch Joplin in dev mode says the following:

Sentry: Initialized with autoUploadCrashDumps: false
MESA-INTEL: warning: Haswell Vulkan support is incomplete
(@joplinapp-desktop:81301): dbind-WARNING **: 02:55:54.186: Couldn't connect to accessibility bus: Failed to connect to socket /root/.cache/at-spi/bus_0: Permission denied
Segmentation fault (core dumped)

I enabled the logging features as described on the "How to enable debugging" page. log-database.txt is 32.9 mb of unreadable characters. log.txt is over 300 mb, starts with a LOT of unreadable characters and only becomes legible at the end, where it's identical to what's displayed in the console. I've extracted the part that actually says things and uploaded it below, along with one of the crash dumps that Joplin has been producing as a result of this.

I'm really at a loss here. Is there a way to fix the file or extract its contents? Thank you so much!

Log file

joplin_crash_dump_20240517T075556.txt (5.23 KB)
joplin-log-extract.txt (5.01 KB)

personalizedrefriger · 19 May 2024 21:15

It might be possible to recover the content of individual notes by opening the database in a text editor (I tested this with Visual Studio Code). If you try this, be sure to create a copy of the database first to prevent additional corruption if the text editor tries to save the file.

crysthala · 20 May 2024 03:13

Great suggestion! I tried opening it in a database viewer, but that also refused to recognize the file. VSCodium (FOSS version of VSCode) can open the file, but only displays those invalid "NUL" characters. Not sure what that means, but it doesn't seem good.

I've learned that R-Linux isn't strictly designed to recover .sqlite files, as it's not in the "known filetypes" list. Maybe the program can recognize "this file lives here" but can't actually reconstruct its contents? Might be time to try different software.

This is my first time doing any data recovery, so I am both a noob and very stressed. I'm sure it'll go great! Thanks again.

dpoulton · 20 May 2024 19:41

Data recovery utilities try to use file system metadata to determine the clusters* a deleted file used when it was a file with space allocated to it on the disk. It then pieces the current data in the clusters together back into a file. However if the file being recovered was large, or there was not much free space on the drive, or if it was some time ago there is a good chance that a later file or files have since been allocated one or more of those clusters. The recovery program may present a file as a "recovered" file but chunks of it will be data from other files and so, essentially, garbage.

Without any idea of the previously used clusters the program may try to identify a file signature ("magic number") and follow contiguous clusters until it finds what it thinks is a footer at the end of the file (if that file format has one) or a "magic number" for a different file type, and then present that data as a possible recovered file. In this case getting garbage is the likely outcome.

The reuse of clusters is also why you should never further use the disk, or recover files to the disk you are recovering data from as you may actually be overwriting the data you hope to get back!

I guess that your recovered file is corrupted as it likely now contains data that was never part of the original database file and so you get the message that the file is "NOTADB - file is not a database".

Good luck in finding a tool that will do any better as I have not heard of one that manipulates the recovered data into a valid file of that (or any other) filetype. They just grab raw data from the disk.

I have not tried it but I see that SQLite.org has a page about "Recovering (Some) Data From A Corrupt SQLite Database" but even they say (my emphasis),

It is sometimes possible to perfectly restore a database that has gone corrupt, but that is the exception. Usually the recovered database will be defective in a number of ways...

I wish you the best of luck ...

* I believe that the equivalent in ext4 file systems may be called "blockgroups".

crysthala · 21 May 2024 02:51

That's all good advice. I'm familiar with the basics of how this works, I've just never actually done it before. When I realized something was amiss, I made a byte-for-byte image of the entire disk on an external drive to preserve it, which is as good as it gets for normal people in data recovery. (There are forensics techniques that can potentially recover overwritten data off the original physical disk, but as I am neither a multi-billion dollar corporation nor a government, those are unavailable to me.) R-Linux's pro version, R-Suite, allows the user to define new filetypes to increase the chance of successful recovery, so that clearly matters for whatever specific recovery techniques this software is using. R-Linux doesn't let me do that, thus looking around for other software before considering forking over $80 for a pro license.

I'm not ruling out the possibility that the file's irretrievably corrupted, I just don't think it's likely given where it is on the drive and how little I've written to that drive since the reformat. And there are years of notes in that file. I refuse to give up until I've tried every means available to me.

I really wish I'd noticed that the .jex export was incomplete sooner. You should all check to make sure ABSOLUTELY EVERYTHING is there when you do an export like that, because "all notes" apparently doesn't actually mean "all notes." And I'm also kicking myself pretty hard for not having proper backups of things! Back up your files, people!! It doesn't matter how careful you are!!!

dpoulton · 21 May 2024 06:15

I wasn't expecting that!

There is a recovery utility called PhotoRec, a part of Testdisk, which has a name that really underplays its capabilities. It will data carve for hundreds of file types, including SQLite. However both R-Tools and Photorec mention that carving a file from unallocated space using file signatures can recover the whole file but only if there is no data fragmentation.

Some file formats (like image files such as JPG etc.) will display what they can even if a bit of the file is garbage. You just get a chunk missing from the picture or weird colours. Other file formats will not be so forgiving and I expect a database file to be one of those.

If you can mount your disk image as a disk you could run Photorec over it. There are a lot of Photorec how-tos out there.

Alternatively Sleuthkit / Autopsy is a free forensic recovery tool (Windows, Linux, Mac) that ingests a disk image file. The disk image data sources it handles are:

Raw Single (*.img, *.dd, *.raw, *.bin)
Raw Split (*.001, *.aa)
EnCase (*.e01)
Virtual Machine Disk (*.vmdk)
Virtual Hard Disk (*.vhd)

It also has a Photorec module built in that will search for a SQLite file signature. If your disk was re-formatted I would guess that you would need to get Autopsy to ignore any current file system in the disk image when ingesting and then parse for SQLite files as if the entire disk was unallocated. It's not a simple "point and click" piece of software, there's a learning curve, but you appear to be quite familiar with a lot of the concepts already. I have "played" with Autopsy but it has never been a tool that I have used to any great degree, or recently for that matter. Also, be aware that depending on the power of the processing machine and the size of the disk that was imaged, ingesting and carving could take hours or days to complete.

I believe that Kali Linux has Autopsy pre-installed.

BUT...

Before committing cash or a lot of time why not first look at your JEX file? A JEX export does not include any note history but I have never known Joplin not to export all notes and resources when asked to. The JEX file is just a RAW export wrapped up in a tar wrapper / container. You could use an archive utility to expand the file back out into a directory of plain .md files and search that directory for any files that actually contain a string or strings from a note that is apparently missing. Why they never imported would be another issue but at least the data would be there.

crysthala · 21 May 2024 12:56

Before committing cash or a lot of time why not first look at your JEX file?

Another fantastic idea! It is indeed full of .md files and one (1) image in the resources folder. But there's not nearly enough stuff in there, and I note with interest that every single file, image included, lists the exact same date and time in the archive's "last modified" metadata. This is neither the same date listed in Joplin's own embedded metadata, nor is it the date I told Joplin to export the backup. Soooo... that's weird. I've been assuming some combination of having a lot of notes and updating the program multiple times, thus possibly changing the database formatting multiple times, broke the exporter logic somehow.

However both R-Tools and Photorec mention that carving a file from unallocated space using file signatures can recover the whole file but only if there is no data fragmentation.

I don't know how to tell if a file's fragmented. R-Linux didn't think the database was, but I don't know how trustworthy that assessment would be if it doesn't know how to read that filetype. I don't actually know which method (signature or filesystem meta) it used to try to recover the database, either. There doesn't seem to be a way to select that, at least not in the freeware.

Also, be aware that depending on the power of the processing machine and the size of the disk that was imaged, ingesting and carving could take hours or days to complete.

Bring it on! Took over 12 hours to image the hard disk. My nearly 10-year-old system and I have nothing but time! I'll look at both PhotoRec (which... I may have tried earlier while panicking...? it's hard to remember) and Autopsy. Thanks so much for your help, it's hugely appreciated.

dpoulton · 21 May 2024 14:48

They say that their software can recover the whole file but only if there is no data fragmentation because when it comes to recovering a file from unallocated space using a file signature, their software cannot tell either.

If you are recovering a deleted file from the file system on which it was originally stored there can be records of the deleted file's metadata still available in that file system. From this the recovery software can follow the list of areas on the disk where the data for that deleted file was once stored. So if that list is complete it does not matter if the file was fragmented as the software knows where all the bits of the file were once stored. It can then cross-reference that with the used disk area list for all the other files currently on the system. If it finds that one or more of the storage areas the deleted file used is now being used by another file it can warn the user that the file is deleted and overwritten. If none have been touched then the file can be exported.

However, your database file was not deleted. You have formatted the disk so that file system metadata is gone, replaced by the new file system. The only real option the software has is to find a cluster that starts 53 51 4C 69 74 65 20 66 6F 72 6D 61 74 20 33 00 at offset 0 (the hex Magic Number for SQLite) and start scooping up contiguous storage areas until it programmatically thinks it may possibly have a complete file. That is a serious amount of computer "guesswork". If the original file was fragmented (i.e. not contiguous) it will not know that and so the data it scoops up will definitely not be that of the original file.

crysthala · 24 May 2024 16:18

You have formatted the disk so that file system metadata is gone, replaced by the new file system.

It's not, in fact, gone! The new partition starts at the beginning of the drive, but the old partition I'm recovering from starts about halfway through its capacity. The latter half of the drive seems to have barely been touched, if at all. The records of the old filesystem's metadata are still there and intact. R-Linux can recover and read those files, and then uses them to reconstruct the old filesystem.

So that's why I'm not sure whether it's trying to read the file signature, or copying it blindly based on the old metadata, or some combination of both. Linux (or at least Mint) auto-defrags HDDs very thoroughly; any given file on a healthy disk will be contiguous, and that will be reflected in the metadata. R-Linux, not being designed to work with .sqlite files, won't necessarily be able to tell if that file has been overwritten since. It might just be taking the old metadata at its word. It's also possible the database issue is completely unrelated to all this recovery business. Won't know unless I can nab the file intact and can rule out the software I'm using to do that as a potential problem.

So, for science, I think I'm going to give PhotoRec another shot. I did actually try it first. Pretty sure I put the wrong command(s) in, and I think I was also looking for the wrong filetype at the time... yeah, trying to learn new software while having a panic attack is not a good idea. Adrenaline makes your logic brain stop working.

crysthala · 5 June 2024 18:50

So it turns out that imaging your main storage drive takes up a lot of space. The image R-Linux made is compressed in a proprietary format, so I have to have it export to a .dsk before I can attempt to recover from it with any other tool. I don't have enough storage to set that up, even after spelunking for spare drives. R-Linux can export an image of only the part of the drive it recognizes as being that old partition, which might give me enough wiggle room. Is there any benefit to keeping the image of the entire physical disk, or can I get away with an image of just that partition?

dpoulton · 7 June 2024 15:21

The benefit would be that if it later turns out that you actually needed some data (not necessarily just database data, it could possibly be system config data) that resides outside the area of the disk you captured, you would still have it!!

system · 7 July 2024 15:21

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
All notes missing Support	11	897	3 July 2022
[linux] advice to help me recover my lost database? Support	8	1177	19 December 2021
Sync error on Linux - database query issue Support	7	99	14 January 2025
Oops....I deleted files Support	19	2071	23 May 2021
How to recover corrupted database? Support	15	1712	20 June 2020