S3 bucket as backend for sync

I’d love to see using S3 as a backend sync in addition to Dropbox/OneDrive/etc.

For corporate security, I’m not allowed to sync sensitive data via third-party services, but an S3 bucket in our account would be allowed.

6 Likes

I doubt that Laurent currently has the time to add another sync target. Furthermore I'm not sure how many people (not being backed by a company) do have access to an S3 bucket.

The idea is not a bad one though. The thing is there are a bunch of interesting backends, like SWIFT, IPFS, even Keybase.

If you find someone to implement the S3 sync target and who is willing to maintain it, I'm confident Laurent would accept a PR. I believe adding an additional sync target should be pretty straight forward. I'd be interested in looking into it, but I don't have access to S3. Also, the code would have to be maintained in the future and I'm not sure, if I wanted to do that.

The whole reason for E2EE is to be able to use 3rd party services, thus above argument is rather invalid.

Either your corporate IT has no clue or they are just ignorant. Either way that argument is hardly a motivator to implement an S3 backend. So they trust Amazon more than others? Why? Have they read what Amazon employees do with their customers' data? Please note that the last paragraph is just my own opinion and that I don't speak for Laurent (the dev).

1 Like

Either your corporate IT has no clue or they are just ignorant. Either way that argument is hardly a motivator to implement an S3 backend. So they trust Amazon more than others?

The issue is data ownership and encryption. S3 objects can be encrypted with a customer managed key where no one, even Amazon, can decrypt without said key. OneDrive, Dropbox, etc. don't support such an option.

The rest of your comments are fair.

1 Like

They don't support the option but Joplin does via its end-to-end encryption algorithm. Have a look at this page for more info about E2EE in Joplin: https://joplinapp.org/e2ee/

For S3, maybe you could sync your S3 content with a directory on your computer, then sync Joplin with that directory via filesystem sync.

I would also like this feature, so I started working on a fork here: https://github.com/alexchee/joplin/tree/aws_s3_sync

It is able to write to the s3 bucket, but it gets a fails trying to sync from S3 to local with this warning: Remote has been deleted between now and the delta() call? In that case it will be handled during the next sync

I do not have much experience with Electron or React Native, and I’m still learning the Joplin codebase and could some help here.

2 Likes

That’s pretty impressive, and you’ve even added test units for your sync target which is great.

This error message means there’s a discrepancy between what’s returned from delta() and what’s actually on the sync target. For example it would happen if delta() returns a file that doesn’t exist.

Are you able to consistently replicate this issue? If so, I would check on what file it’s happening, and see if it exists on S3. Also checks what’s returned by list() and delta() and see if the listed files are on S3. Maybe you delete a file, then call list() and it returns cached data that includes the deleted file.

More rarely, this error can happen due to a race condition, but as the message says it will be resolved on next sync. If you often see this message it’s probably a bug.

1 Like

Has there been any additional progress or movement on this? I stopped being a front-end developer a long time ago, so I won’t really be able to contribute with the work, but perhaps I can help sponsor the rest of the work on this? I’d really, really love having an s3-compatible backend for Joplin. It’d simplify some things and also provide some resiliency.

cc @laurent

Thanks for the feedback, I’ve been sidetracked by another project. I’ll get back to fixing it this week and see how far I’ll get.

I’m pretty sure it’s a bug from me trying to support an s3 key prefix to the items. Probably my list() or delta() is not taking in consideration the prefix key in the paths. The errors happens for every file, so that’s probably it. Also, it’s pretty easy to develop with the handy tests in there already.

I gotten further. I am able to sync resources and notes through S3 on electron, but it’s erroring with encryption enabled. A few tests fails involving checking if the items are in synced and decrypting.
I also haven’t tested it out on React Native, where I’m predict some issues with get and put with the File Driver.
Hopefully, I’ll be able to get it to a decent state this week to make a PR and ask for input.

1 Like

This is great news, even though you’ve hit a few road blocks. I don’t know anything about S3, so I’m not gonna be of much help, but I just wanted to say that adding another sync target is a great experience and very helpfule to others.

I also don’t know how many non-business users actually have access to S3 buckets.

Actually, I think it seems to working with encryption. I think the tests are failing because I had to implement my way to upload and download from S3 instead of using the uploadBlob and fetchBlob shims. I’m using aws-sdk to do all the S3 work, basically S3 is like a key-value store.

I’m currently assuming options === 'file' for put means I need to upload the file from options.path to S3. For get, I’m downloading the data and writing it to option.path and resolve with a mocked Response object (like in FileApiDriverDropbox).

These line in https://github.com/laurent22/joplin/blob/v1.0.194/CliClient/tests/synchronizer.js#L42-L43

const remoteContent = await fileApi().get(file.path);
const content = await BaseItem.unserialize(remoteContent);

Will always unserialize every remote item, even if it’s a Resource. In my case, the resource is just a Response object that can’t be unserialized.

I get this error, on 'synchronizer should create remote items with UTF-8 content':

Message:
    Expected Error: Invalid property format: 1: 1 to be null. Tip: To check for deep equality, use .toEqual() instead of .toBe().
  Stack:
    Error: Expected Error: Invalid property format: 1: 1 to be null. Tip: To check for deep equality, use .toEqual() instead of .toBe().
        at <Jasmine>
        at localNotesFoldersSameAsRemote (/Users/alexchee/projects/joplin-fork/CliClient/tests-build/synchronizer.js:85:16)
        at <Jasmine>
        at processTicksAndRejections (internal/process/task_queues.js:97:5)

I’m not sure if this is an issue with the test for Remote Items or my implementation. Does my implementation of get and put sounds compatible?

related code:

1 Like

In Joplin, a resource is a metadata file (a plain text file) and a binary blob, that contains the file data. For example, an image could be stored like this on the sync target:

6530a2f099b3423fb5d2254260b6dffa.md # metatada file, a UTF-8 encoded text file
.resources/6530a2f099b3423fb5d2254260b6dffa # binary blob

In the test you've mentioned, the item that's being unserialised would be 6530a2f099b3423fb5d2254260b6dffa.md and it should unserialise successfully.

So I think the issue in your case, is that you are somehow returning binary data where you should return a UTF-8 encoded text file. Maybe add some console statements to see exactly what file you are returning and how you are encoding it.

Ok, I think I know the issue. I think my implementation of list is wrong. Since S3 is just a key-value store, it’ll return every thing recursively with a given path. So when fileApi().list() is called, it returns all *.md files, as well everything in .resources and .sync.

Before I change it to a shallow listing behavior, should list be listing stuff recursively?
Nvm, I just changed the list behavior to not be recursive, and more tests are passing. Just got to figure out why the others are failing.

Thanks for you help! Getting closer now.

I think it’s at a good enough point for a PR to get more eyes on it. https://github.com/laurent22/joplin/pull/2815

If you need creds for an AWS bucket I can make one for you under my account.

1 Like

Indeed list() should not be recursive

could someone post instructions for how to set things up using the S3 option in the configuration settings?

I want to use the S3 API on the backblaze storage system, and I expected to need to specify an endpoint URL in addition to a bucket name, key, and secret, but I I am not asked for the endpoint.

I see s3 as beta in the ios app (how do I find the version number of the ios applet?)
while desktop 1.3.18 does not mention this option.

Many thanks your great work. It's really awesome to have also AWS included!
Is it possible to use the S3 sync.target also with the terminal version of Joplin? If so what is the sync.target number? Sorry I found nothing in the docu so far.

When it comes to owning your data, supporting solutions such as minio (open source s3 compatible) would be awesome as you can host minio yourself.
I see that the S3 support is beta right now and I hope the implementation will be a standard one and not just a AWS S3 limited support.

Does it work on android? I created a bucket on Wasabi (S3 comptatible) and worked on desktop (linux), but the same configuration parameters fail on android.