Plugin: Extract Paragraphs

djsudduth · 11 February 2024 18:42

I've just released my first plugin version of my paragraph extractor. If you're like me you have a lot of varied notes in Joplin covering a number of topics. This plugin allows you to search across any selected notes for a particular topic word or hashtag that is contained within a note's paragraph and the extractor will copy all identified paragraphs from all those notes to a single new note. The original notes are not modified in any way.

For example, let's say I'm doing research on planetary atmospheres, I can search for a specific word like 'neptune' or 'mesosphere' and any paragraph in any note I've selected that contains the word will be added to a single new note. Or, another example is that I have a lot of work notes spanning statuses and projects. I could then use a project name as the keyword and extract/create a single new note of all paragraphs that have any mention of that project from status notes, goals notes, task notes, etc.

The repo is here: GitHub - djsudduth/joplin-plugin-paragraph-extractor: Extract specific paragraphs out of Joplin notes using keywords, hashtags or custom tags

Use

Simply highlight the notes you want to extract paragraphs from and choose "Extract paragraphs from notes"

In this example, the keyword is "hydrogen". Two of the four notes have that keyword within the note. Here is one of those two source notes:

When all four notes are selected to extract the paragraphs with that keyword, the new note looks like:

Features

Search and extract by single keyword or hashtag (prefix tags supported #, $, %)
Any keyword used is added to the new note's Joplin tags along with any tags used in the source notes
Option to extract the paragraph H1-H6 headers along with the paragraph
Option to extract from bulleted lists only the bullet that has the keyword
Add a title link back to the original note that the paragraph was extracted from
Option to append a hashtag to the new note
Give the new extraction note a custom title

I hope everyone really finds it useful and please suggest any features you feel would add to your use cases!

muzak · 11 February 2024 23:41

Here are some feature suggestions:

Search and extract by phrase
Search and extract from a specific notebook
Option to toggle adding keyword as tag to new note (unless it already exists)
Option to show dialog to enter keyword or hashtag and modify custom note title on right-click or keyboard shortcut by default (I didn't expect setting keyword or hashtag in plugin settings)
Option to use content blocks like the Note Overview plugin does to define, organize, and auto-update new note content

djsudduth · 12 February 2024 02:20

Thanks for the suggestions, @muzak !

Search and extract by phrase - this does work now but I haven't tested it extensively
Search and extract from a specific notebook - great idea, I'll look into that
Option to toggle adding keyword as tag to new note (unless it already exists) - there is no toggle yet for that option - the tag is added in all cases, but I'll add it in the next version (this will be good to tie to extract by phrase since phrases shouldn't be tags in most cases)
Option to show dialog to enter keyword or hashtag and modify custom note title on right-click or keyboard shortcut by default (I didn't expect setting keyword or hashtag in plugin settings) - that is something already in the backlog for the next version
Option to use content blocks like the Note Overview plugin does to define, organize, and auto-update new note content - I'm not familiar with that - I'll take a look

PS - I want to thank @JackGruber for the inspiration!

shikuz · 12 February 2024 04:08

very nice @djsudduth !

I think that Joplin deserves more paragraph-level information extraction, and this is a step in the right direction.

in fact, this is similar in some ways to something that I started working on recently. let's see if it's still worth developing a slightly different variation on the same theme.

djsudduth · 9 March 2024 21:21

Paragraph Extractor has been updated to version 1.1.1

New additions:

A dialog box has been added to allow setting the keyword and tag to use as the paragraph extraction search. The keyword and tag are then saved as default values
You can now just select a notebook and use a toggle to extract paragraphs from all the notes within that specific notebook (right click on either a notebook or just a note within the notebook allows selecting all notes)
Keyword phrases are now supported as well

djsudduth · 22 April 2024 00:14

Paragraph Extractor has been updated to version 1.1.2

New additions:

Note paragraph blocks can now be extracted to a note with the Joplin tag title that matched either the hashtag or the keyword - similar to Logseq linked block references in tag notes
The extraction dialog box was modified to be more clear

djsudduth · 28 September 2024 22:27

My Paragraph Extractor plugin has been updated to v-1.1.5. There are quite a few deep features with this plugin, so I've created a quick 15 min tutorial video on how to use it:

Here are the changes since v-1.1.2

Added option to have backlinks to the parent note embedded at the end of each extracted paragraph
Added extraction of full page if the hashtag and keyword are at the end of the note

Hope you find it useful!

PackElend · 15 December 2024 16:56

Hi there,
Great plugin I could use to help me build a decent glossary and wiki.

Any chance that it will update extracted paragraphs once? So if the source changes, the copy gets updated.

djsudduth · 24 December 2024 18:21

I'm glad you asked that question! I was just working through how that might operate! Tell me more about what you'd like to see!

Here is my current thinking - I'll add some metadata to the extracted note about the sources of extraction so that if you select any notes with that metadata, they will refresh with the new or updated text. This could get a bit complex especially if the original source note changes significantly. (I'll look at automating that later with some type of refresh period).

Do you need the reverse - to update the original note if the extracted note changes?

PackElend · 29 December 2024 20:36

happy to see that we habe both the same toughts
Basically I want to avoid to write the same text twice

that we would be what? number of pharagraph in the note, count characteres etc. ?
Wouldn't it be save to mark the text and put in lable it. Allow to give it specific name or use guid, e.g.
[extractor] ...text ... [/extractor=my first reused pharagraph]

could be convenient but not sure, as you don't know where else the text is used. It may does not fit anymore at other places.
If you combine it with back-links you have chance to know where the text used.

btw, I'm trying to figure out how it could be used with Plugin: Note Link System. Would be greate the pharagraphs get IDs to use them by the other plugin too.
Could be that the featuer is currently failing: Plugin: Note Link System - #138 by PackElend

My current focus is to build an extend Glossary:
Any suggestions on what plugins could be created? - #240 by PackElend + How to use Joplin to create a Wiki including Glossary

djsudduth · 29 December 2024 23:17

Thanks for responding. The text would still be extracted / duplicated but could be refreshed if the sources change. I'm not sure this helps your glossary needs since my plugin extracts paragraphs and not selections of text. There wouldn't be any linking other than back to the originating notes.

I don't think I would wrap extracted text with identifiers - I would use markdown/html comments as metadata on the sources/paragraphs and setting in the form  at the end of the extracted note. I already have a format I'm testing (e.g., paragraphs have guids, etc). I'm not sure what is available for plugin-to-plugin communication for sharing data. I would be great if Joplin has paragraph / block ids built-in.

PackElend · 30 December 2024 12:48

others use this as well, Plugin: Note Link System should even set IDs for elements.
If both uses them, things can be shared easily.

I'm not sure if Joplin supports that nativly, there is some indication that could be the case: Can Anchors Link Within A Single Note?

I'll posted a support question to get an answer on this: Does Joplin support Paragraph, Element IDs

djsudduth · 10 January 2025 23:20

My Paragraph Extractor plugin has been updated to v-1.2.1. I've added the ability to refresh existing extracted paragraph note blocks from their source notes. This allows you to go back to any source notes for a particular extracted note, make changes and refresh without having to create a new note.

I've created a quick tutorial here (if you haven't watched the plugin overview video in the comments above - be sure to do that first):

PackElend · 15 January 2025 20:21

How to keep track of the paragraphs?

Unfortunately my company's IT is going nuts with restrictions. I cannot use joplin in my office at the moment (where I used it most)
So I'm not much of tester anymore

djsudduth · 15 January 2025 22:42

No problem - I appreciated your comments. Right now, there isn't a good way to keep track of paragraphs. The main issue is what the source paragraph is vs. the extracted one. What if both change? I wish there were paragraph block identifiers - but, Joplin isn't Logseq or RemNote - so those would have to be tracked with a lot of metadata.

BTW, the next version is going to add a diff function to show what was added/deleted in the original note and modified in the extracted note if the paragraphs don't match. That way, anything added or deleted in either note will be visible (if the option is chosen).

jop030 · 1 March 2025 23:00

Amazing plugin, @djsudduth ! Thanks so much for this!
Also great video tutorial!

As an idea for a really useful additional feature: Automatic extraction of text highlights. In the wysiwyg editor, highlights with the ==mark== syntax can be used, which are extremely useful. It would be amazing if with your plugin you could automatically extract all the highlighted parts of a page.

My suggestion: In the extraction dialog, you could add another checkbox saying "Extract paragraphs with highlighed text". When the checkbox is activated, the paragraphs that have at least one highlight in them are extracted (even if there is no special tag/keyword in that paragraph).

I suggest to optionally combine that with another tag/keyword, so that both the highlight and the tag/keyword would trigger an extraction.

Also very much looking forward to see this working in the mobile app, as indicated as an outlook in your video tutorial.

Again: Great work! Thanks so much for your great contribution!

djsudduth · 2 March 2025 00:28

Thanks!! Those are some great ideas and I’ll take a look at adding that.

Topic		Replies	Views
Create note from highlighted text Plugins	86	18315	13 December 2024
Any suggestions on what plugins could be created? Features	239	23930	9 January 2025
How to use Joplin to create a Wiki including Glossary Lounge	0	241	29 December 2024
Is there a plugin for this? (tags applied to individual paragraphs within a note) Support	7	1126	20 October 2021
Tags workflow: creation, copy/pasting & exporting Features	10	518	9 April 2025

Plugin: Extract Paragraphs

Use

Features

Related topics