That’s a fair point, especially with loosely titled notes.
In my approach, I’m not relying only on headings. While building the tree, each node also stores a short LLM-generated summary of its content. So even if a note is titled something like “Monday meeting”, the node would still capture what the meeting was actually about.
This makes the structure more semantic-aware rather than purely based on titles. The idea is to combine structure with lightweight semantic understanding, instead of depending entirely on embeddings.
That said, I agree it may not fully replace semantic search in all cases, but it could still work well as a complementary or lightweight alternative, especially for structured markdown notes.
If any GSoC contributor is reading this, as noted above that can be a good opportunity for a project, so don't hesitate creating a proposal if you have some ideas.
@adamoutler thanks for the detailed breakdown, lots of useful thinking in there. A few things I want to pick up on:
Your reranking observation is interesting: that it matters more for smaller on-device models than for larger cloud models. Good to keep in mind for the infrastructure project, since we'd want to support both.
The hybrid search idea (keyword ↔ vector slider) is a nice concrete way to think about the UI for that.
The "negative friction" idea for MCP design makes a lot of sense. Keeping round trips and context usage low matters. There are already a few Joplin MCP servers out there (including one I maintain) that work with the desktop client's Data API. Yours works with Joplin Server, which is cool, I don't think anyone else has explored that space. For the GSoC projects though, which run as plugins / core inside the app, I think we can keep things simpler by describing tools directly in LLM calls rather than going through MCP (as I described in my reply above). Whether Joplin should bundle an MCP server inside the app for external consumers is a different question.
On vector DB choices: the discussion here is mostly about plugin-level projects that need to work inside the app on desktop and mobile. That narrows things down quite a bit. Your overview is very helpful, and I agree that sqlite-vec looks like a natural starting point given its cross-platform support. Making sure whatever we pick actually works on mobile (where FS access is limited) should be part of any infrastructure project. Perhaps we may need to include PRs to the mobile app.
@Krishh interesting idea with the hierarchical summary tree. LLM-generated summaries at each node are more robust than headings alone, and a nice complement to embedding-based retrieval. As @adamoutler noted, pure structural approaches can miss semantic relationships, but combining a summary tree with Joplin's search tools (as I described above) could give you the best of both. Might be worth exploring as part of the search project.
This discussion aligns very closely with what I’ve been thinking while working on my proposal.
For the past 4 days I've been planning a proposal combining the ideas 1,3 and 4 because after thoroughly researching about these ideas, it came to my attention that they fall under the same single umbrella that is they share the same retrieval foundation.
I kept coming back to the fact that all 3 required building the embedding pipeline first then letting the three features consume it.
This approach aligns with the removal of the duplication issue i.e., multiple plugin usage in which each plugin builds its own pipeline and we keep repeating the same work multiple times which could have been minimized into a standalone pipeline used by all 3 features or more in future decreasing money wastage in terms of memory and architecture.
I’m currently thinking of framing the project around this shared infrastructure as the core deliverable, with a few minimal consumer features (search, chat, auto-tagging) implemented mainly to validate and demonstrate the system end-to-end rather than as fully independent products.
Would love to hear if this direction aligns with how you’d expect these ideas to be approached, or if there are constraints in Joplin’s current architecture that would push this in a different direction.
I’ll be creating a separate discussion post to explore this approach in more detail and clarify a few open questions, would really appreciate any guidance or feedback there.
As someone working on the categorisation proposal, I've already empirically validated the compressed similarity range issue with nomic-embed-text during POC development, notes with short bodies required thresholds below 0.60 regardless of topic distance. I'm planning to implement the put(note)/query(text) interface with a swappable backend so it can migrate to shared infrastructure later.
Quick question for @shikuz - for the categorisation use case specifically, would you recommend sqlite-vec embedded directly in the plugin, or designing around an external service interface from the start?"
I’m a paid user of Joplin for several years; I compared many note-taking apps and Joplin was clearly the optimal choice for me. I’m no technophobe; I made my living in software development, now retired. However, I’m a confirmed skeptic when it comes to AI; I’ve turned it off in all the desktop and Android apps where possible. So, I would implore anyone working on AI features to please make them optional, via an easily selectable on/off switch. Thanks.
Any such feature would definitely be optional. We have no intention to push these as some companies do - we hope however that whatever will be developed will be useful, and if it is then users can enable them themselves
When done correctly, AI is a completely transparent net-positive. Semantic search can provide enhanced contextual understanding using on-device, local-only models.
eg. Someone searches for “kitty” but they wrote the word cat. The semantic understanding provided by the vector model handles that automatically while the traditional search continues to provide the most recent exact match.
eg2. they search for “kitty” again, and find that note where they talked about “tom” and a picture of a cat.
As far as the user knows, it’s not AI. It’s just a really good search function. This sort of AI SHOULD be pushed. Joe Shmoe doesn’t know how it works, or why it works. They just know it is better… as long as it’s done right.
That’s a great insight. Initially I was thinking of adding a semantic search layer, but your approach feels simpler and more practical compared to introducing extra complexity.
I think we can start with Joplin search + structured tree, and optionally use semantic search as a fallback when keyword-based retrieval doesn’t work well. That way it stays simple while still handling harder cases.