Python script for importing Kindle highlights into Joplin

escafandra · 24 December 2020 18:22

Hi everyone, thanks for your great work developing Joplin. I created a Python script that imports highlights and annotations from a Kindle eReader device into Joplin. Please feel free to test it and/or improve it.

It creates a notebook in Joplin and individual notes for each book inside that notebook. All the highlights and annotations for each book are stored in the same note and separated by a horizontal line. When a note for a book already exists, this script updates the note with the new highlights.

Requirements

python 3.7+
httpx
pandas
dateparser
joplin_api

Download this package

git clone https://gitlab.com/seawind/kindle-highlights-in-joplin.git

Getting your kindle highlights and annotations in Joplin

Access the folder kindle-highlights-in-joplin.
Open the file conf.py and enter your Koplin Token.

To find your Joplin Token, open Joplin and access to Tools -> Options -> Web Clipper. Enable the web clipper service and copy your authorisation token in the token variable in the conf.py file.

Define a name for the notebook in Joplin in which you want to store your highlights.
Connect your Kindle device and find the path to the file where your highlights and notes are stored. It is usually inside the folder Documents. The name of the file depends on the language in your device. In English it is My Clippings.txt, or in Spanish Mis recortes.txt.
Copy the path in the variable path_highlights in the file conf.py

The file conf.py looks like this:

token = 'your_token'
folder = 'Kindle highlights'
path_highlights = '/path/to/your/Kindle/My Clippings.txt'

Open a terminal in the kindle-highlights-in-joplin folder and run the next command:

python sync_kindle_joplin.py

Considerations

This script has been tested with the highlights stored in a Kindle D01100. It might be different for other models.
Kindle eReaders usually store the highlights and notes in a txt file. The name of the file depends on the language in which the device is configured.
The elements stored in that file follow a specific structure, a specific sequence of characters separates individual highlights. The metadata for each highlight is written in the language of the device. This package uses regular expressions to retrieve the metadata to build a unique identifier for each highlight. So far, it includes keywords in Spanish and English to identify the relevant data. You can add keywords for a different language in the kindle_conf module.

This script is based on the great work of @foxmask who created the joplin-api for Python.

JackGruber · 25 December 2020 17:09

Hello, because the joplin-api does not support the new API, the script works only with Joplin 1.3.x and smaller correct!
For the first import no problem, but then every time all notes are imported!

escafandra · 25 December 2020 17:52

Hi @JackGruber, thanks for your comment and for taking the time to test it. I have Joplin 1.4.19 and developed this tool based on that version. I also realized that the Python package joplin-api does not support some features of the new Joplin API, for example retrieving all the notes from Joplin. But this tool does not rely on those features.
Were you able to retrieve the highlights from your Kindle device? Were the highlights imported every time and replicated? If that happened, could you indicate me how to replicate it?
Thanks

JackGruber · 25 December 2020 19:22

Ah ok, you search for each title seperate, to avoid the pagination.

Not sure, but on the first try it crreates every time ever book noote again, but at the moment not ...
I have created a pull request.

seatrout · 21 February 2021 13:15

I have been playing with this, using Joplin 1.7.11. There are some small problems with the regexes used to extract title, author, and so on, which I have fixed. See comments on github. But when all that is done, it then crashes on me with an error I find inexplicable, about finding a float where a string is expected in the highlights.

I can see nothing to suggest there is anything other than a collection of strings in the highlights. But I don't understand pandas at all.

escafandra · 13 April 2021 08:55

Hi @seatrout
Thanks for your contribution with regexes. Would you mind sharing a section of your highlights file so I can identify the issue?

Topic		Replies	Views
Import txt files Features	4	4015	27 September 2018
Can you annotate clipped web pages? Support	9	1743	25 June 2019
OCR for existing Joplin notes Apps	17	4554	12 April 2021
Annotations, highlights, QDA - discussion Features	1	1378	21 September 2020
Plugin: Highlighter Plugins	9	5601	15 May 2021

Python script for importing Kindle highlights into Joplin

Requirements

Download this package

Getting your kindle highlights and annotations in Joplin

Considerations

Related topics