NOTE:
My coding knowledge is currently very limited. This plugin was created entirely with AI tools, and I may be limited in my ability to fix any issues.
Paste as Markdown
A Joplin plugin that allows you to paste HTML formatted text as markdown in the markdown editor.
The plugin uses Turndown to convert HTML to markdown.
Useful for scenarios where you can't use the joplin web clipper (e.g. copying text from an email client) and/or where you don't want to edit the note with the rich text editor (to avoid changes to existing markdown formatting by the rich text editor).
The plugin prioritizes clean markdown, the only HTML elements that are retained are: <img> embeds (only if the image has a specified width/height) and sup/sub/ins. <br> tags are removed and excess whitespace is normalized.
How to use
In the markdown editor, right click and select "Paste as Markdown" (or use the keyboard shortcut, ctrl + alt +v by default).
If you have HTML formatted text in the clipboard, the plugin will convert it to markdown formatting and paste the markdown formatted text.
If you don't have HTML formatted text in the clipboard, the plugin will fall back to pasting the plain text.
Features
Image Handling - Keep remote/base64 encoded images as-is, convert images to Joplin resources, or remove images entirely.
DOM preprocessing - Sanitizes HTML with DOMPurify and uses DOM pre-processing to remove unwanted elements before turndown conversion.
Heading normalization - Removes all nested markup from Headings so that turndown emits a clean markdown heading.
Code block normalization - Improved reliability when pasting code blocks. Normalizes known code block wrappers/containers to simple <pre>/<code> and infers language from common class patterns and applies a normalized class="language-xxx".
Text normalization - Normalizes nbsp and zero width space characters to regular spaces. (Optionally) normalizes smart quotes to regular quotes.
List normalization - Re-nests orphaned lists, so numbering/indentation is properly preserved when pasting nested lists from sources like Outlook/Google Docs/Onenote. Uniform spacing (one space) after list markers.
Image normalization - HTML <img> embeds will only contain a standardized set of attributes: src, alt, title, width, height. Images that are converted to joplin resources will be unwrapped from external anchor links. Promotes inline css width/height to HTML attributes so that image sizes are maintained through turndown conversion.
Whitspace normalization - Minimal post-processing to remove leftover <br> elements and excess whitespace between paragraphs.
Table support - HTML tables are converted to markdown tables via turndown-plugin-gfm. Additionally, the plugin wraps orphaned table elements with <table> tags, allowing pasted cells from excel/google sheets to be pasted as tables.
Settings
Include Images - By default, images (external or base64 encoded) are included in the pasted text. If desired, you can un-check include images in the plugin settings so that images are not included in the pasted text.
Convert images to Joplin resources - If enabled (along with Include images), external (http/https) and base64 encoded images will automatically be converted to Joplin resources.
Normalize smart quotes - Convert Word/Office smart quotes to regular quotes for better markdown compatibility.
Force tight lists - Removes space between list items.
Other notes
This is similar to the existing "Paste Special" plugin, however that plugin appears to be unmaintained and doesn't have a keyboard shortcut or right click context menu.
Wrap orphaned table elements with <table></table>, fixes copy/pasting from spreadsheets like excel and google sheets so that text is pasted as a markdown table 0fc8751
NOTE: When copy/pasting from a spreadsheet, the pasted markdown table will have an empty table header (because excel/sheets, etc... don't differentiate between table content/headers in their text/html clipboard content).
Enable @joplin/turndown 'preserveImageTagsWithSize' setting, so that image sizes are retained when pasting as markdown 981806e
Improvements to heading anchor cleanup logic, now handles more cases. Cleans up useless empty heading links when copying from places like github readme or discourse posts.
Enable preserveNestedTables setting
Improve image removal logic: Also remove links whose sole meaningful children were images (for images that were embedded inside links, clean up the leftover image links too)
Fix issues pasting text with soft breaks
Fix empty leading lines when pasting HTML fragments
Remove <br> tags from converted markdown (except for in tables, inline code, and code blocks). You'll now get very clean markdown even when pasting from sources like Outlook emails which previously resulted in <br> all over the place (an issue that also happens when pasting into joplin's rich text editor) instead markdown paragraphs.
Remove excessive empty lines between paragraphs. If you've ever used Obsidian, you're probably familiar with how it's turndown implementation often results in massive excess whitespace when pasting html formatted text, not an issue here!
v1.0.3
Don't collapse excess empty lines in code blocks 1b155c3
DOM-based preprocessing - Sanitizes HTML with DOMPurify and perform DOM pre-processing to remove unwanted elements (empty permalink anchors, exotic image attributes, etc...).
Standardized formatting for <img> embeds, remove all attributes except for the typical src, alt, and width/height.
New Feature - Option to convert images to Joplin resources on paste (like the Rich Text editor does). Only works with http(s) or base64 encoded images.
Note: For images that are wrapped inside links, the images will be unwrapped (and the link removed) if converting the image to a Joplin resource (for cleaner markdown, and avoids console errors with Rich markdown plugin).
v1.0.5
Convert code blocks to plain text before DOMpurify step, fixes issue where tags (e.g. <script>/<style>) not allowed by DOMpurify could be removed from code block text.
Significantly reduces plugin size and compile time due to removal of JSDOM depdendency
Latest upstream improvements
Images with explicitly defined sizes will still be kept as <img> embeds with the specified width/height (this previously relied on joplin-turndown specific functionality but was re-implemented as a turndown rule).
Switch from @joplin/turndown-plugin-gfm fork to @truto/turndown-plugin-gfm fork
Joplin's gfm plugin is focused on joplin's wysiwyg editor use case - maintaining the original table styling as closely as possible, and to do this it keeps tables as HTML which doesn't play nicely with my plugin's approach of sanitizing/cleaning the HTML before turndown conversion (and focusing on clean markdown).
Works far better than the unmaintained upstream turndown-plugin-gfm plugin in my testing
Improves copy/paste from excel - first row will now be used as the table header
Tables will now always be converted to markdown. For complex tables, the conversion isn't 100% perfect, but works very well with the new gfm plugin in my testing. Using the complex "cheat sheet" plugin from joplin's Markdown Guide for example:
Simplify regex post processing to remove leftover logic to handle <br> in tables- the new GFM plugin handles this now.
Joplin resource conversion improvements: Sanitized filename extraction, use path.join(), path traversal validation. Simplify temp file clean up to prevent potential race condition. Improved error handling.
v1.0.8
Normalize alt text to remove line breaks and control characters during DOM processing. Fixes issues copy/pasting images from MS word (for images without an explicitly defined alt text, word apparently uses AI to generate an alt text and inserts two line breaks followed by "AI-generated content may be incorrect" inside the alt text, and the line breaks in the alt text break rendering of the pasted image embed)
Improve DOMpurify config
v1.0.9
Add option to normalize smart quotes to regular quotes (enabled by default)
v1.0.10
Fixed issue where [when normalize quotes setting is enabled] smart quotes sometimes weren't normalized if the HTML fragment included a code block
remove common UI elements such as buttons during DOM processing, as they aren't relevant to markdown conversion and can result in noise like text labels from buttons (e.g. when copying from llm chats)
improved error handling
retry logic for downloading remote images
fix path validation bug
include title attribute in html <img> embeds (if present)
v1.0.11/v1.0.12
Add option to force tight lists (removes blank lines between list items). NOTE: Doesn't remove blank lines between multiple block elements in lists (e.g. list item with multiple paragraphs)
If source HTML contains escaped HTML entities in prose (e.g. using <table> as an example in a paragraph, without inline code), wrap them in code (so Turndown emits them as inline code, e.g. <table>). Prevents raw HTML tags in resulting markdown (especially problematic with <table> since a stray table tag results in joplin rendering all subsequent text in a table).
If style attribute is present (and contains width and/or height in px), use those as the html img width/height attributes (ONLY if img element doesn't already contain width and/or height). Allows image sizes to be maintained when copying from OneNote, which only defines the width/height in the style attribute.
Improved link unwrapping logic when converting remote images to Joplin resources. Seeing stray brackets above/below images (that were wrapped in links) should now be a much rarer occurrence.
v1.0.14-v1.0.16
fix issue where tight lists option didn't apply to lists in block quotes
adjusted inline style image size promotion logic to skip 0 width/height values, fixes issue where html <img> embeds could be pasted with 0px width or height value.
Performance/Security: pass sanitized/cleaned DOM node directly to turndown instead of unnecessarily serializing to string.
Security: Don't fall back to original HTML if DOM sanitization fails, fall back to plain text instead.
v1.0.17
Improved heading anchor cleanup logic (fixes turndown emitting bracket fragments in some cases)
Fix list numbering/indentation when pasting from outlook/google docs/onenote (and probably others). Pasting nested lists (ordered list with nested unordered lists) from these sources will now properly maintain list numbering and indentation.
v1.0.18 - v1.0.19
table conversion: forked turndown-gfm plugin and added fix for table conversion where tables could be created without delimiter row
resource conversion: improve path validation to prevent potential platform specific edge cases
pre-processing: relaxed button element removal to keep buttons that are inline with text (llm chats often use them for styling text)
turndown conversion: uniform list marker spacing, there is now one space after all list markers (turndown defaults were a bit over the top with three spaces after bullets)
If HTML sanitization fails, fall back to clipboard text/plain (if available) or show toast message instead of trying to extract plain text from HTML (text/plain will typically be available alongside text/html in clipboard anyway)
v1.0.20 - v1.0.22
Pre-processing now finds <b> and <strong> tags within heading elements (h1-h6) and unwraps them, as headings are already rendered as bold in Joplin (and most markdown renderers), making the extra ** tags redundant.
Added an explicit HR guard in tightenListSpacing() so the blank-line stripper skips Markdown horizontal rules (---, ***, ___, spaced variants). Fixes bug where there isn't a blank line between end of a list and following <hr> when force tight lists is enabled.
improve empty image anchor cleanup so it handles decorative svgs
significant improvements to code block normalization and language detection. Now validates against languages supported by highlight.js (fixing issue where you could end up with things such as "default" or "container" for code fence language).
set alias for text/plaintext/plain language to txt (because txt is the only one that disables syntax highlighting in both the joplin markdown editor and viewer).
v1.0.24
fix issue with language inference regex so that it catches highlight-source-* pattern
v1.0.25 - v1.0.26
improve heading anchor cleanup so it removes permalink symbols
improve image conversion temp file cleanup
improve pasteHandler error handling
Add turndown rule for <ins>
fix build issue where test files were copied to /dist
v1.0.27
Fix issue where checkboxes weren't reliably converted to task lists: Added DOM pass to lift paragraph-wrapped checkboxes so Turndown's GFM rule can emit task list markers.
v1.0.28 - v1.0.29
trim leading/trailing empty lines in code blocks
Updated code block normalization so that empty highlighter spans are stripped before Turndown sees them, ensuring <pre> keeps only its <code> child (), fixes issue where some code blocks could be converted to inline code instead of fenced code block
Updated turndown-gfm plugin to fix bug where table cell content could be over-escaped (e.g. \- becoming -\\).
v1.0.30
Adjusted UI element cleanup to handle role based buttons the same way as it handles <button> elements (to avoid removing inline text).
Updated turndown-gfm plugin, includes improvement where it won't convert single row/single cell tables to markdown
Update dependencies
v1.0.31
Some sources (e.g. claude web chat) wrap tables in <pre> tags, which would trigger the plugin's code block normalization, resulting in the table being flattened and converted to a fenced code block. Adjusted code block neutralization/normalization to unwrap tables from <pre> tags to prevent this from happening.
v1.0.32
normalize headings to prevent empty line between heading characters and text in scenario where html tags like <p> are wrapped inside headings
Expand space normalization to convert zero-width spaces (and variants) to regular spaces (as they show up as weird red dots in joplin markdown editor)
v1.0.33
Handle codemirror code blocks in code normalization
v1.0.35
Code block normalization now handles scenarios where code language is in a span or div (containing only the code language) before the code block. Previously, this would result in the code language being placed above the code fence in the converted markdown, now the span/div is removed and the language is used as the code fence language (only if the span/div immediately precedes the code block and text only contains one of the supported code highlighting languages).
Improved removal of copy buttons from code blocks (prevents "Copy" text appearing above code fence).
Expanded control character cleanup to handle directional control characters
Update turndown to 7.2.2
v1.0.36
Simplified & improved heading cleanup, will now remove all nested markup from headings