Coding Period Update - Week 5

Repository: GitHub - khuongduy354/joplin-sync-lib
Demo: joplin-sync-lib/src/sample_app at master · khuongduy354/joplin-sync-lib · GitHub

Progress

  • Implemented Update item, and flow for update by using previous timestamp.
Client A and B have access to same remote. To update, client must provide newer content, and a timestamp of previous latest sync for each item (lastSync). In Joplin SQL schema it's item.sync_time.
This has to be tracked by client, providing inaccurate lastSync may cause data corruption in remote.
These are the cases when writing to remote:

Assuming at t1, clients A, B, and remote are synced.

At t2, A wants make a write request to remote:
  1. Remote is newer (remote.updated_time > lastSync) ->  
  -> Client B has written to remote during [t1,t2] period
  -> Abort the update and notify client to pull new changes and resolve conflicts (if there're unsynced changes made locally)
  -> After that, client can provide a newer lastSync, and make an update request again

  2. Remote content is exactly equal lastSync (remote.updated_time == lastSync)
    -> Client B hasn't written to remote during [t1,t2] period 
    -> Apply the update casually on remote
    -> Client A receive a new timestamp from update and save it (lastSync)

  3. Remote content is older (remote.updated_time < lastSync)  
    -> This shouldn't be possible, A has provided incorrect timestamp.
    -> Client should pull changes and check carefully before making a new update request.

  4. The item is created by client B, and hasn't been synced with A (there's no initial lastSync in client A)
    -> A can update the item by retrieving its updated_time and use it as lastSync parameter 
  • Unit tests added for: Locks, Synchronizer createItems, getItemsMetadata (delta), getItem
  • Implementing OCR service: this part hasn't finished yet, there're still some problems with polling new items. The flow is as follow:
  1. OCR keep polling every X interval for new items, to check for new items, it save a timestamp, any items with updated_time older than that is considered "new"
  2. If item is resource -> check for ocr fields -> if empty, spin up OcrService (from Joplin package) -> processing data
  3. The new data is written to remote with Update method above, for every batch of items, it will track the latest timestamp, to look for future new items.
  • Fixing bugs & enhanced features: ISO to unixMs() to do delta correctly, unserialize from string to item object

Plan

Finished project overview

I'll restructure my plan, since I didn't get the priorities right in the propsal. I think this is what a finished project should look like:

  • CRUD on remote target:

    • Create Read Update is working well
    • Delete operation: I'm still looking for a use case for this operation
    • Downloading blobs
    • Upload blobs
  • Encryption:

    • Encrypt before upload data
    • Handle data when E2E state is toggle on/off accross clients
  • A few examples use cases demonstrated with code:

    • Mail to notes (push only service)
    • Aggregate notes into todo list (pull only service)
    • OCR Server-side (push and pull service)
  • Other important features:

    • Locks
    • Share service integration without breaking
    • Input/Output format, and sanitizing
  • Good to have features:

    • Helpers method for databases
    • Batching/Pagination
  • Documentations

Next week plan

  • E2E: At first, I though E2E was just passing Master keys across clients, but there're the problems of E2E turning on/off and causing entire data to re-encrypted and re-sync.
    I'll look into it more thoroughly.
  • Continue OCR service implementation
  • Share service integration: I haven't used the share service yet, I think it need to keep track of some fields while syncing.
  • Adding documentations: for delta, update operations, and E2E (if possible).
  • Cleaning up unimplemented TODOs
  • Applying delta to SQLite: this is a low priority, I think adding docs on how to delta is more flexible.
  • Adding more unit tests

Issues

  • E2E: as mentioned above, my current implementation is that, if a remote has a ppk, it will enable the E2E on sync library. However there're more complex cases: when remote has changed the E2E keys and has a different key from client, etc...
    I'll look more into it.

I think may give wrong impression to those who are not familiar with the project. It's not really an OCR service, only the part required to pull changes from the sync target, make a change and push it back.

1 Like

Delete large resources for instance. Or delete completed TODOs older than a certain threshold. Or 2-way sync with some other note/todo service.

Ultimately you don't need to have a use case, whatever Joplin supports, the API must support as well.

1 Like