Repository: GitHub - khuongduy354/joplin-sync-lib
Demo: joplin-sync-lib/src/sample_app at master · khuongduy354/joplin-sync-lib · GitHub
Progress
- Implemented Update item, and flow for update by using previous timestamp.
Client A and B have access to same remote. To update, client must provide newer content, and a timestamp of previous latest sync for each item (lastSync). In Joplin SQL schema it's item.sync_time.
This has to be tracked by client, providing inaccurate lastSync may cause data corruption in remote.
These are the cases when writing to remote:
Assuming at t1, clients A, B, and remote are synced.
At t2, A wants make a write request to remote:
1. Remote is newer (remote.updated_time > lastSync) ->
-> Client B has written to remote during [t1,t2] period
-> Abort the update and notify client to pull new changes and resolve conflicts (if there're unsynced changes made locally)
-> After that, client can provide a newer lastSync, and make an update request again
2. Remote content is exactly equal lastSync (remote.updated_time == lastSync)
-> Client B hasn't written to remote during [t1,t2] period
-> Apply the update casually on remote
-> Client A receive a new timestamp from update and save it (lastSync)
3. Remote content is older (remote.updated_time < lastSync)
-> This shouldn't be possible, A has provided incorrect timestamp.
-> Client should pull changes and check carefully before making a new update request.
4. The item is created by client B, and hasn't been synced with A (there's no initial lastSync in client A)
-> A can update the item by retrieving its updated_time and use it as lastSync parameter
- Unit tests added for: Locks, Synchronizer createItems, getItemsMetadata (delta), getItem
- Implementing OCR service: this part hasn't finished yet, there're still some problems with polling new items. The flow is as follow:
- OCR keep polling every X interval for new items, to check for new items, it save a timestamp, any items with updated_time older than that is considered "new"
- If item is resource -> check for ocr fields -> if empty, spin up OcrService (from Joplin package) -> processing data
- The new data is written to remote with Update method above, for every batch of items, it will track the latest timestamp, to look for future new items.
- Fixing bugs & enhanced features: ISO to unixMs() to do delta correctly, unserialize from string to item object
Plan
Finished project overview
I'll restructure my plan, since I didn't get the priorities right in the propsal. I think this is what a finished project should look like:
-
CRUD on remote target:
- Create Read Update is working well
- Delete operation: I'm still looking for a use case for this operation
- Downloading blobs
- Upload blobs
-
Encryption:
- Encrypt before upload data
- Handle data when E2E state is toggle on/off accross clients
-
A few examples use cases demonstrated with code:
- Mail to notes (push only service)
- Aggregate notes into todo list (pull only service)
- OCR Server-side (push and pull service)
-
Other important features:
- Locks
- Share service integration without breaking
- Input/Output format, and sanitizing
-
Good to have features:
- Helpers method for databases
- Batching/Pagination
-
Documentations
Next week plan
- E2E: At first, I though E2E was just passing Master keys across clients, but there're the problems of E2E turning on/off and causing entire data to re-encrypted and re-sync.
I'll look into it more thoroughly. - Continue OCR service implementation
- Share service integration: I haven't used the share service yet, I think it need to keep track of some fields while syncing.
- Adding documentations: for delta, update operations, and E2E (if possible).
- Cleaning up unimplemented TODOs
- Applying delta to SQLite: this is a low priority, I think adding docs on how to delta is more flexible.
- Adding more unit tests
Issues
- E2E: as mentioned above, my current implementation is that, if a remote has a ppk, it will enable the E2E on sync library. However there're more complex cases: when remote has changed the E2E keys and has a different key from client, etc...
I'll look more into it.