New Joplin MCP Server

Happy Friday! We made a gemini-cli extension for Joplin as part of our internal tooling at belsar.ai. I got the greenlight to make it public. It's super useful for doing system design with mermaid / and being a team knowledge store (think Notion).

Github: https://github.com/belsar-ai/joplin-mcp

joplin-mcp

6 Likes

Heyo, I checked out this project and I have to say that I really don’t like the design choice of using a script execution engine rather than small tools dedicated for specific purposes. The code of this MCP is a perfect example of why MCP servers can be a high risk deployment, because you use vm like it would isolate code. Yeah, maybe it does, but not in a secure manner. It is easily possible to escape the vm. There are solutions like isolated-vm which would fit better here, but they are also not perfect.

You are half-correct and half-incorrect. You are correct that vm is for execution isolation, and not for sandboxing.

Where you are incorrect is that we need sandboxing here. These are all templated local calls to the Joplin API. joplin-mcp has the independence to link together templated calls for efficiency.

Using a tool like isolated-vm or WASM here for sandboxing adds a significant amount of installation complexity, but the security gains are largely academic. You've got to think about the attack surface - for a tool like joplin-mcp which only makes local calls to Joplin, the primary threat surface comes from supply chain attacks. But we don't have any exotic dependencies. If somebody already pwn3d your box, having isolated-vm on joplin-mcp would be akin to deadbolting and latching your front door, but leaving your back door and garage door wide open.

In fact, we started with the dedicated-tools architecture you described, but switched in order to follow Anthropic’s best practices for creating MCP servers: https://www.anthropic.com/engineering/code-execution-with-mcp - and ended up with significant improvements to performance and token usage as a result.

2 Likes

Are you imagining this MCP as something only you run locally, or as something other people might also connect to?

If it’s truly single‑user and never exposed beyond your own machine, then yeah, most of the scary remote attack surface goes away and the main risks are supply chain and prompt‑injection through the model.

But as soon as you let anyone else access this MCP, the attack vector you describe as basically non‑existent suddenly becomes very real: a compromised client, prompt, or user can drive the script engine

Furthermore, because everything goes through a generic “execute script” tool it’s hard to separate harmless read‑only operations from destructive ones (mass delete, overwrite, exfiltration, etc.).

That’s also why I don’t buy that sandboxing is “largely academic.” If a sandbox or isolation layer is what stands between a bad script and rm -rf on my notes or filesystem, that’s not academic – that’s the difference between “the script crashed” and “the script just wiped my data.”

If you think something like isolated‑vm isn’t worth the complexity here, fair enough – but then the natural alternative is OS‑level isolation: run the MCP in a dedicated Docker container or user with restricted mounts and permissions.

You REALLY don’t want the possibility that AI can do what AI pleases to do.

  1. Yes, this mcp server is single-user (most are).
  2. The code being executed is visible in the tool call, and you need to manually approve it.
  3. These are templated calls going through an isolated execution environment that you need to manually approve each time. The security risk of sandbox vs execution isolation is largely academic here.
  4. I do want the LLM to propose what it wants to do. That is why I use LLM tools. And this mcp server is following the best practices of Anthropic (the designers of MCP): https://www.anthropic.com/engineering/code-execution-with-mcp

Note that code execution introduces its own complexity. Running agent-generated code requires a secure execution environment with appropriate sandboxing, resource limits, and monitoring.

Source: https://www.anthropic.com/engineering/code-execution-with-mcp

The Anthropic article says "appropriate" controls, not "maximum."

These controls are in place for a single-user stdio MCP server making local API
calls.

Right now, we're using vm with fetch and process explicitly revoked. Beyond
that, we've got a keyhole api which limits the scope of calls to the joplin api
object. Those controls can be maliciously sidestepped if a user manually
approves a malicious request.

A joplin-mcp user would specifically need to have their context poisoned,
receive a malicious tool call to joplin-mcp, and then manually approve the
malicious call in order for a vm breakout to take place. This is the same
threat model that claude/gemini-cli themselves have.

So as a user, your threat model remains "look at the tool calls before
approving commands" in both cases of using claude code / gemini cli, or using
those with joplin-mcp.

I should say that I'm not entirely unsympathetic to what you're saying. It was a concious engineering decision to go with vm over podman/isolated-vm/wasm for ux, with what is for me personally, a lateral move wrt threat model (Read-before-approving commands).

The tradeoffs currently are:

  1. map Joplin API to MCP endpoints <- Too slow and too token heavy. often takes 20-30 seconds or more for multi-step commands when accounting for all of the back-and-forth chatter to the claude backend
  2. use script execution with vm <- state of the art for speed and token usage, easy to install
  3. use script execution with podman/isolated-vm <- state of the art for speed and token usage, requires binaries/system packages to install. Theoretical hardening for specific attack vectors.

If we can get even stronger isolation without breaking user experience, I'd be for it. I love good engineering, so if you have any ideas you want to bounce around, I'd be down to chat more / review commits to the repo.

1 Like

This looks like a good approach

2 Likes

Holy cow! This looks like it could be exactly the engineering control we’re looking for.

For anyone following along, sandbox-runtime is a beta project out of Anthropic that lets us wrap mcp servers with os-level isolation. It’s typescript-based but works with any language. It can be included directly in the mcp server project without requiring complex docker setups / C++ reqs. It’s brand new—currently in an early research preview.

Awesome find @simwai!!

@shikuz - ccing b/c it might be interesting for GitHub - alondmnt/joplin-mcp: MCP server for the Joplin note taking app too :slight_smile:

3 Likes

Any recommendation on a good model to run along this MCP server? I tried a few, including some "instruct" ones but didn't really get good results

Ahh, bummer. I only use this mcp server with the flagship proprietary models from Anthropic, OpenAI, and Google - It's possible that self-hosted instruct models might not have the on-the-fly scripting abilities needed to have a good experience yet?

( I think from context you’re talking about self-hosting here… not my area of expertise)

Right, I should have mentioned I'm trying to get it working with a local LLM.

In my tests google/gemma-3-27b gives good results now and then but it's random and it can occasionally get stuck in a loop (executing the same generated code over and over). It's possible I need to tune its parameters somehow. Most other models either didn't work at all or got stuck in loops.

openai/gpt-oss-20b finds the correct result but doesn't know how to present it and prints "done" instead. The actual result is basically in the log as raw json.

From what I'm reading, open-weight models still aren’t quite on par for multi-step reasoning and generating correct code from API docs on the first try, though that’s just from a casual look.

I do have a local feature branch that sits between direct MCP-to-API mapping and script execution - composite tools for things like generating table of contents and editing note sections. It requires more round trips than a frontier model running scripts, but it should be much more accessible to locally hosted models. Happy to dust it off if you'd like to check it out.

I could also look into whether both modes can coexist - script execution for frontier models, composite tools for lighter environments. Depending on what tooling I find, it might not be a huge maintenance burden to support both.

@simwai - I pushed a new release that uses the sandbox-runtime you suggested.

We now have a setup where joplin api <-http-> broker <-stdio-> runner

The runner is a script execution environment - it is fully cut off from the internet, and can't write/execute scripts on the host. It uses the broker as a proxy to reach the joplin api. Zod is used on the broker to guarantee requests from the runner are in an allowlist of joplin api endpoints.

I trimmed CRUD support for notebooks/attachments/tags also for the time being. Will want to thoughtfully add those back in w/ Zod support over time.

Thanks again for the fantastic suggestion!

1 Like

Sounds like a huge leap in architecture. Nice that I could help out. :ok_hand:

There is a lot of research being done regarding looping behaviour. I think one try could be to detect a loop and try then to trigger a reset prompt. I heard that this could help, but possibly there are better solutions findable in projects like opencode or some research papers.