RFC: Architecture for a Secure Plugin Ecosystem

I feel this analysis is again based on an unverified assumption: that writing rules is hard. But first are we even going to write any rule? I see the example on their blog: User input appears to be compared in an insecure manner that allows for side-channel timing attacks.. We are definitely not going to manually write down hundreds of rules to cover every single case out there (and miss hundreds of edge cases and possible exploits while doing this). Surely there's a way to automate this or existing rules we can use?

And if we do have to write certain rules, this is now trivial thanks AI. I'm not saying, because of that we should use CodeQL, but the whole analysis seems to be based on something we won't do (writing rules), or something that is assumed is hard but is not (thanks to AI).

As far as I can see hallucinations are rare in code reviews these days and it's only going to get better, so that shouldn't be a reason not to use them.

Before using them anyway we need to determine: do we need to use them? Perhaps you can start evaluating the code review tools on some plugins and see what comes out?

I've added the current workflow and the workflow that we'll acheive after this system is live at the first and last section of the proposal.

Thanks for adding this, but the proposal shouldn't start with this. Please review its structure perhaps based on https://discourse.joplinapp.org/t/gsoc-2026-how-to-submit-your-proposal-draft/49137

But if this is something which overcomplicate the process we can drop it and the pipeline would still work fine.

I think what confuses me is that you talk about multiple workflows and some of them "waking up", etc. Please get yourself familiar with GitHub Actions - what you describe could be just one workflow with sequential jobs, and each job can have its own secure context.

I've shifted the /approve based CI to label based execution, so now only user with Triage access, Write access and Maintainer/Admin access can trigger the approval flow using status : approved label.

So does that seem like a better approach to you? You don't need to agree with me automatically, I can be completely wrong sometimes too. So please keep looking at it critically.

Yes, these part are not something I have finalized yet. This is just an extended reasoning of my previous though process.
I'm comparing manually different tools right now seeing what problem can arise, how the end review looks like, which will eventually decide my thought on should we use LLM or not.

I'll look deeper through github actions, though in case of confusion due to my wordings I want to clarify that
Right now there are only 2 github action workflow - one for the review when the issue is opened review.yml and one for the build and distribution of plugin (the split-job is a part of it) publish.yml

Yes , it does seem a better approach to me , while both the method are almost similar - having label based trigger in my opinion will be a better option as the label itself in the issue screen will serve as a indicator (aslo filter) that what issue are approved and revoked.

Aditionally, using /approve require triggering a check on each comment, though using label - the CI will just trigger on labeled which only a person you choose can do.

Hi laurent,
I ran few test on Codeql and semgrep using already published plugins, and I found out that the integrated report of CodeQl + Semgrep and then sanitizing the report using Gemini (or any llm) was the best approach. I ran both codeql and semgrep in a single run and the review time was also not too much.

Both Codeql and semgrep were good at comletely opposite things and were complimenting each other's usecase. So, we can actually integrate both of them for SAST review.

I have updated the tooling section with the screenshot and observations. Said that I am still doing few more tests, are the observations I have mentioned enough or should I describe something more?
I've removed the ss as it was cluttering and making the proposal too long and hard to read..

And my another question was Since I only have 10 days till the coding period start is there any other specific part of the proposal other than tooling that need a fix or a different approach?

You haven't analysed the existing system as I suggested - that's important because some of your decisions (e.g. the two-workflow split) seem to come from misunderstanding what's already there. Again, please review the existing system in details and add a section about it.

The Proposed DX: A Dedicated CLI Gateway

Is joplin-plugin publish intended to be a new package? As mentioned earlier, we already ship some tooling with the plugin generator, so I was wondering why you chose not to build on top of the existing tooling instead?

actively holds push or admin rights to the repository URL submitted in the Issue

It's a bit difficult to review this proposal because it feels like it misses the mark in a few places.

For example, I'm not sure I understand the reasoning around repository ownership verification. Why does the submitter need admin or push rights to a repository, and which repository exactly? Does it even matter? We have a repository URL and a commit hash, that's what gets reviewed. I'm not sure why it's important who is admin on what.

More generally, there are a lot of security terms and architectural concepts throughout the proposal, but the reasoning that leads to these decisions is often unclear. It would help to simplify some parts and explain with your own words what concrete problem is being solved and why the proposed solution is necessary.

The SAST Engine: CodeQL + Semgrep

This section makes a lot of strong claims without providing any source or methodology to explain how these conclusions were reached. Terms such as "dangerous blind spots" or "insufficient code-security depth" are strong assertions that should be backed by equally strong evidence.

Overall this section is not good enough and possibly overcomplicated, given how important that part is. Ideally we would use a single tool. If you believe using more tools are necessary, please provide concrete evidence.

3.2 Supply Chain Security: Socket.dev :

This seems like copy and paste from their marketing documents. Is that a third tool you want to introduce? Why?

3.3 The LLM (Noise Reduction) :

Benchmarked Results: In sandbox testing with intentionally malicious Joplin plugins, the LLM successfully deduplicated 35 raw CodeQL/Semgrep alerts into 3 actionable findings, eliminating roughly 91% of false-positive noise for the reviewer while retaining critical threats.

I'm not convinced that having an LLM review the findings of CodeQL would help actually. The LLM needs the context of the whole plugin code to do anything useful, in which case, maybe they could do the whole analysis and we get rid of the other tool?

But I feel there's not enough information to make a decision. You mentioned reviewing "intentionally malicious Joplin plugins". Where are these plugins? How many? And where's the detailed analysis?

4.1 The Split-Job Trust Boundary :

I wish you'd use fewer buzzwords in the proposal. Technical readers are not going to be convinced by dramatic wording. What matters here are sources and clear reasoning.

This section tends to exaggerate the security implications instead of explaining concretely what the actual problem is and why the proposed architecture is necessary.

4.2 Handling Compromised Plugins (Obsoletion) :

Let's say that's a stretch goal for now. I think there's enough to do as it is.

4.3 Vulnerability Disclosure Policy :

Also a stretch goal

5.2 Namespace Locking :

This section makes me realise I didn't see anything about how you ensure plugin ID are unique? Did you review how we currently enforce uniqueness? And if so, what would suggest as an alternative? This alone should be its own section - please add it, it's very important.

Going forward, I won't have time to re-read the full proposal each round. To make my reviews efficient:

  1. Please start every update with a changelog at the top: bullet list of "Changed X (was Y) - addresses your feedback from [date/topic]." If there's no changelog, I'll skip the update.

  2. For each change, quote the previous wording or say "previously: ..." so I can see the delta without hunting for it.

  3. End every update with a numbered list of explicit questions you need me to answer. If it's not on that list, I'll assume you're not blocked.

Thank you for the detailed and direct feedback,
I have updated the main proposal to address all of your points. Please check out the updated proposal whenever you have the time!

Thank you for the update.

  1. Security Scanning Pipeline & Tooling Trade-offs :

The conclusion of your LLM review contradicts your tool choice - you seem to be saying LLM is superior to all the other tools, yet you choose the other tools. Either the LLM results are not accurate, or the tool choice is wrong. Both can't be true.

Test1 : test - sca + sast · akshajrawat/Joplin-tooling-test@397f6d8 · GitHub

Thank you for sharing the test results. As it is it not very useful though because it's not tailored to Joplin and the plugin system. That makes me realise that one thing is missing, that should be fed somehow to any tool we use: the threat model.

Please could you add a section about this? What are we trying to achieve exactly? For example we are not scanning for code quality or bugs, we are also not scanning DDoS and most likely not for XSS either.

Also make sure the tools are scanning the right code. For some reason it includes "@joplin/fork-htmlparser2" which is not a dependency of org.joplinapp.plugins.YesYouKan. And even if it is, that would be a bug to fix in the app, not in a plugin.

For now let's say we are going to use just one tool, and the architecture should be flexible enough to add more later on. Please pick the best one.

  1. Plugin ID Uniqueness :

Thank you for addressing this. Two things that need to be incorporated:

  • A repository URL can have multiple forms like https://github.com/Foo/Bar, https://github.com/Foo/Bar.git. So you need to normalise the URL before comparison. That needs to be in the spec

  • You didn't mention what should happen if a plugin is moved to a different repository. For now let's say that once a plugin is published its repository cannot be changed - please mention it in the spec too.

do you have a specific requirement to scan third-party dependencies despite the noise?

For the most part things like npm audit are not useful, as they simply list CVEs and more often than not it's not actionable. You're right though that we shouldn't ignore dependencies - for example if a plugin has a dependency to a purpose built malicious package.

But for this npm audit won't help. So any tool you use should somehow handle this - not listing CVE but scanning the dependencies for actual maliscious code. Perhaps here it will make sense to have a second tool. You don't have to have a solution right now though, but please mention this in the spec.

Proposal scope

Finally, I feel the project may be a bit too large given the time frame and I would prefer you focus on a few aspects and do it well rather than looking at many different parts and losing focus.

What I would suggest for now is to build a proof of concept, something that we can release without modifying the existing plugin release flow. That will allow us to properly test and evaluate the solution.

So the part you could include are these:

  • Submission flow: You create the CLI tool to publish plugins (npm run publish), but you don't disable the existing npm publish. The new npm run publish for now will publish to a test repository. It also creates the issue in that repository.

  • Plugin ID enforcement: This will need to be handled either at the CLI tool level or on the repository workflows.

  • Repository workflow: you can implement the whole review process using labels, as well as the workflows which should result in an issue containing the vulnerability findings. Those findings are more than just a test - part of the project is to evaluate what should be in that report. And for now please use only one tool.

Additionally:

  • For the issue creation, simply require the absolute minimum that is needed for the user to create that issue in our repo, via the CLI tool. It can either be via Personal Access Token or via OAuth. Please pick one.

  • The threat model is also needed as it drives many of the other changes.

I think we are close now - please update the proposal with the above change and let me know if you have any question

Thank you for the detailed feedback. I’ve updated the proposal to reflect those changes!!

Sorry this is not much better than it was and you seem to have expanded the scope instead of narrowing it.

IF LLM IS REJECTED : (CodeQl)

I understand that it means I need to decide for you? There should not be multiple sections about the tools at this point. By now I would expect that you have researched that in depth and you are able to decide by yourself which one to go for.

This seems to be due to "@joplin/utils": "^3.0.1" dep which explicitly lists @joplin/fork-htmlparser2 as a direct dependency. The security tool looks at that entire chain.

That's a disappointing answer. I would have expected you'd look at what can be done to avoid these false positives. Or are you saying that's how it is and we can't do anything about it? That's going to be a problem because it makes the reports unusable for us.

CLI Version Sync: To ensure all developers submit against the latest security rules, the CLI includes a "version check" pinging the NPM registry to prevent submissions from outdated CLI versions.

"submit against the latest security rules" - but why? The security rules are on the server, not on this local tool. Please clarify your reasoning for this new feature.

You also didn't address this comment:

But for this npm audit won't help. So any tool you use should somehow handle this - not listing CVE but scanning the dependencies for actual maliscious code. Perhaps here it will make sense to have a second tool. You don't have to have a solution right now though, but please mention this in the spec.

Finally please remove the stretch goals, there's already enough in there.

Sorry for weak context, I have absolutely looked into the ways to supress it, but I thought since the SCA part would potentially need a second tool which is not in scope right now, I don't have to mention it right now. I have added everything I have discovered to supress this.

Sorry this part was a copy paste mistake from my side as I am using joplin app to actually edit the proposal and pasting it here block by block. I left the old paragraph which was there in my previous new CLI approach , it was actually supposed to be removed.

I have updated this section with all the needed context. Please review it whenever you get time.

Since the scope is now PoC I would need a test plugin repository which will be the fork of the joplin/plugin repo. Will it also be provided by you?

I've created GitHub - joplin/plugins-test · GitHub to which you should have access

Also the addition of new npm run publish can be done in the original generator-joplin repository without any regression, so should it be done there or will we make a seperate repository for that too?

Yes it will be part of generator-joplin

and if the publish flow suggested right now is good enough can I start implementing it? The publish flow is as follows :

Please start. But your reasons for choosing an LLM are not very convincing. You've removed the information about the other tools now but I would liked to see an in-depth comparison of them, based on several existing plugins of various sizes, and based on threat model that you have now defined. Please create this in a separate forum post so that you have more space to do a proper analysis without squeezing into a section.

As a result, even if you start the implementation, make sure that the part that scan the plugins is modular and can be swapped to a different tool. We do not want to be vendor locked to Google or other vendors.

Thank you! I have verified access to the plugins-test repository, and I will begin implementing the publish flow directly within generator-joplin.

I will ensure that the scanning component is strictly seperated from the core pipeline via a clean interface wrapper. The CI will treat the scanner as a black box that takes the plugin paths and outputs a standardized JSON. This guarantees we can swap the tool from Gemini to Semgrep, CodeQL, or any other static analysis engine down the line with zero regression to the workflow.

I have a small question regarding the PR.
Should I open a single pr for the whole publish flow? OR should it be done in phases too? and so I have to also open an issue for it?
There will be 4 steps + 2 utils with each seperate files... these are the steps:

  • Metadata verification and extraction
  • local commit hash and remote commit hash validation
  • github authentication flow
  • Issue submition

also do I have to write test for the publish flow?
For github authentication flow I will need a official GITHUB_CLIENT_ID of a Github Oauth app with device flow enabled

Please don't forget this, I'm sure if you saw it:

I would liked to see an in-depth comparison of them, based on several existing plugins of various sizes, and based on threat model that you have now defined. Please create this in a separate forum post so that you have more space to do a proper analysis without squeezing into a section.

Yes please split the work into small PRs

Yes sorry, I'll provide it shortly.. I had 2 exams past 4 days so I was unable to give a lot time to it