Plugin: Extract Paragraphs

I've just released my first plugin version of my paragraph extractor. If you're like me you have a lot of varied notes in Joplin covering a number of topics. This plugin allows you to search across any selected notes for a particular topic word or hashtag that is contained within a note's paragraph and the extractor will copy all identified paragraphs from all those notes to a single new note. The original notes are not modified in any way.

For example, let's say I'm doing research on planetary atmospheres, I can search for a specific word like 'neptune' or 'mesosphere' and any paragraph in any note I've selected that contains the word will be added to a single new note. Or, another example is that I have a lot of work notes spanning statuses and projects. I could then use a project name as the keyword and extract/create a single new note of all paragraphs that have any mention of that project from status notes, goals notes, task notes, etc.

The repo is here: GitHub - djsudduth/joplin-plugin-paragraph-extractor: Extract specific paragraphs out of Joplin notes using keywords, hashtags or custom tags

Use

Simply highlight the notes you want to extract paragraphs from and choose "Extract paragraphs from notes"

In this example, the keyword is "hydrogen". Two of the four notes have that keyword within the note. Here is one of those two source notes:

When all four notes are selected to extract the paragraphs with that keyword, the new note looks like:

Features

  • Search and extract by single keyword or hashtag (prefix tags supported #, $, %)
  • Any keyword used is added to the new note's Joplin tags along with any tags used in the source notes
  • Option to extract the paragraph H1-H6 headers along with the paragraph
  • Option to extract from bulleted lists only the bullet that has the keyword
  • Add a title link back to the original note that the paragraph was extracted from
  • Option to append a hashtag to the new note
  • Give the new extraction note a custom title

I hope everyone really finds it useful and please suggest any features you feel would add to your use cases!

8 Likes

Here are some feature suggestions:

  • Search and extract by phrase
  • Search and extract from a specific notebook
  • Option to toggle adding keyword as tag to new note (unless it already exists)
  • Option to show dialog to enter keyword or hashtag and modify custom note title on right-click or keyboard shortcut by default (I didn't expect setting keyword or hashtag in plugin settings)
  • Option to use content blocks like the Note Overview plugin does to define, organize, and auto-update new note content
2 Likes

Thanks for the suggestions, @muzak !

  • Search and extract by phrase - this does work now but I haven't tested it extensively

  • Search and extract from a specific notebook - great idea, I'll look into that

  • Option to toggle adding keyword as tag to new note (unless it already exists) - there is no toggle yet for that option - the tag is added in all cases, but I'll add it in the next version (this will be good to tie to extract by phrase since phrases shouldn't be tags in most cases)

  • Option to show dialog to enter keyword or hashtag and modify custom note title on right-click or keyboard shortcut by default (I didn't expect setting keyword or hashtag in plugin settings) - that is something already in the backlog for the next version

  • Option to use content blocks like the Note Overview plugin does to define, organize, and auto-update new note content - I'm not familiar with that - I'll take a look

PS - I want to thank @JackGruber for the inspiration!

1 Like

very nice @djsudduth !

I think that Joplin deserves more paragraph-level information extraction, and this is a step in the right direction.

in fact, this is similar in some ways to something that I started working on recently. let's see if it's still worth developing a slightly different variation on the same theme.

4 Likes

Paragraph Extractor has been updated to version 1.1.1

New additions:

  • A dialog box has been added to allow setting the keyword and tag to use as the paragraph extraction search. The keyword and tag are then saved as default values
  • You can now just select a notebook and use a toggle to extract paragraphs from all the notes within that specific notebook (right click on either a notebook or just a note within the notebook allows selecting all notes)
  • Keyword phrases are now supported as well

Paragraph Extractor has been updated to version 1.1.2

New additions:

  • Note paragraph blocks can now be extracted to a note with the Joplin tag title that matched either the hashtag or the keyword - similar to Logseq linked block references in tag notes
  • The extraction dialog box was modified to be more clear

My Paragraph Extractor plugin has been updated to v-1.1.5. There are quite a few deep features with this plugin, so I've created a quick 15 min tutorial video on how to use it:

Here are the changes since v-1.1.2

  • Added option to have backlinks to the parent note embedded at the end of each extracted paragraph
  • Added extraction of full page if the hashtag and keyword are at the end of the note

Hope you find it useful!

3 Likes

Hi there,
Great plugin I could use to help me build a decent glossary and wiki.

Any chance that it will update extracted paragraphs once? So if the source changes, the copy gets updated.

1 Like

I'm glad you asked that question! I was just working through how that might operate! Tell me more about what you'd like to see!

Here is my current thinking - I'll add some metadata to the extracted note about the sources of extraction so that if you select any notes with that metadata, they will refresh with the new or updated text. This could get a bit complex especially if the original source note changes significantly. (I'll look at automating that later with some type of refresh period).

Do you need the reverse - to update the original note if the extracted note changes?

happy to see that we habe both the same toughts :slight_smile:
Basically I want to avoid to write the same text twice

that we would be what? number of pharagraph in the note, count characteres etc. ?
Wouldn't it be save to mark the text and put in lable it. Allow to give it specific name or use guid, e.g.
[extractor] ...text ... [/extractor=my first reused pharagraph]

could be convenient but not sure, as you don't know where else the text is used. It may does not fit anymore at other places.
If you combine it with back-links you have chance to know where the text used.

btw, I'm trying to figure out how it could be used with Plugin: Note Link System. Would be greate the pharagraphs get IDs to use them by the other plugin too.
Could be that the featuer is currently failing: Plugin: Note Link System - #138 by PackElend

My current focus is to build an extend Glossary:
Any suggestions on what plugins could be created? - #240 by PackElend + How to use Joplin to create a Wiki including Glossary

Thanks for responding. The text would still be extracted / duplicated but could be refreshed if the sources change. I'm not sure this helps your glossary needs since my plugin extracts paragraphs and not selections of text. There wouldn't be any linking other than back to the originating notes.

I don't think I would wrap extracted text with identifiers - I would use markdown/html comments as metadata on the sources/paragraphs and setting in the form <!-- extraction source data here --> at the end of the extracted note. I already have a format I'm testing (e.g., paragraphs have guids, etc). I'm not sure what is available for plugin-to-plugin communication for sharing data. I would be great if Joplin has paragraph / block ids built-in.

others use this as well, Plugin: Note Link System should even set IDs for elements.
If both uses them, things can be shared easily.

I'm not sure if Joplin supports that nativly, there is some indication that could be the case: Can Anchors Link Within A Single Note?

I'll posted a support question to get an answer on this: Does Joplin support Paragraph, Element IDs

My Paragraph Extractor plugin has been updated to v-1.2.1. I've added the ability to refresh existing extracted paragraph note blocks from their source notes. This allows you to go back to any source notes for a particular extracted note, make changes and refresh without having to create a new note.

I've created a quick tutorial here (if you haven't watched the plugin overview video in the comments above - be sure to do that first):

4 Likes

How to keep track of the paragraphs?

Unfortunately my company's IT is going nuts with restrictions. I cannot use joplin in my office at the moment (where I used it most)
So I'm not much of tester anymore

No problem - I appreciated your comments. Right now, there isn't a good way to keep track of paragraphs. The main issue is what the source paragraph is vs. the extracted one. What if both change? I wish there were paragraph block identifiers - but, Joplin isn't Logseq or RemNote - so those would have to be tracked with a lot of metadata.

BTW, the next version is going to add a diff function to show what was added/deleted in the original note and modified in the extracted note if the paragraphs don't match. That way, anything added or deleted in either note will be visible (if the option is chosen).

2 Likes