Python script for importing Kindle highlights into Joplin

Hi everyone, thanks for your great work developing Joplin. I created a Python script that imports highlights and annotations from a Kindle eReader device into Joplin. Please feel free to test it and/or improve it.

It creates a notebook in Joplin and individual notes for each book inside that notebook. All the highlights and annotations for each book are stored in the same note and separated by a horizontal line. When a note for a book already exists, this script updates the note with the new highlights.

Requirements

Download this package

git clone https://gitlab.com/seawind/kindle-highlights-in-joplin.git

Getting your kindle highlights and annotations in Joplin

  1. Access the folder kindle-highlights-in-joplin.
  2. Open the file conf.py and enter your Koplin Token.
  • To find your Joplin Token, open Joplin and access to Tools -> Options -> Web Clipper. Enable the web clipper service and copy your authorisation token in the token variable in the conf.py file.
  1. Define a name for the notebook in Joplin in which you want to store your highlights.
  2. Connect your Kindle device and find the path to the file where your highlights and notes are stored. It is usually inside the folder Documents. The name of the file depends on the language in your device. In English it is My Clippings.txt, or in Spanish Mis recortes.txt.
  3. Copy the path in the variable path_highlights in the file conf.py

The file conf.py looks like this:

token = 'your_token'
folder = 'Kindle highlights'
path_highlights = '/path/to/your/Kindle/My Clippings.txt'
  1. Open a terminal in the kindle-highlights-in-joplin folder and run the next command:
python sync_kindle_joplin.py

Considerations

  • This script has been tested with the highlights stored in a Kindle D01100. It might be different for other models.

  • Kindle eReaders usually store the highlights and notes in a txt file. The name of the file depends on the language in which the device is configured.

  • The elements stored in that file follow a specific structure, a specific sequence of characters separates individual highlights. The metadata for each highlight is written in the language of the device. This package uses regular expressions to retrieve the metadata to build a unique identifier for each highlight. So far, it includes keywords in Spanish and English to identify the relevant data. You can add keywords for a different language in the kindle_conf module.


This script is based on the great work of @foxmask who created the joplin-api for Python.

8 Likes

Hello, because the joplin-api does not support the new API, the script works only with Joplin 1.3.x and smaller correct!
For the first import no problem, but then every time all notes are imported!

1 Like

Hi @JackGruber, thanks for your comment and for taking the time to test it. I have Joplin 1.4.19 and developed this tool based on that version. I also realized that the Python package joplin-api does not support some features of the new Joplin API, for example retrieving all the notes from Joplin. But this tool does not rely on those features.
Were you able to retrieve the highlights from your Kindle device? Were the highlights imported every time and replicated? If that happened, could you indicate me how to replicate it?
Thanks

Ah ok, you search for each title seperate, to avoid the pagination.

Not sure, but on the first try it crreates every time ever book noote again, but at the moment not ...
I have created a pull request.

1 Like

I have been playing with this, using Joplin 1.7.11. There are some small problems with the regexes used to extract title, author, and so on, which I have fixed. See comments on github. But when all that is done, it then crashes on me with an error I find inexplicable, about finding a float where a string is expected in the highlights.

I can see nothing to suggest there is anything other than a collection of strings in the highlights. But I don't understand pandas at all.

1 Like

Hi @seatrout
Thanks for your contribution with regexes. Would you mind sharing a section of your highlights file so I can identify the issue?