About semantic search implementation in joplin

Hey, I'm Yahya — yahya94812 (Yahya) · GitHub

I built semantic search for Joplin: GitHub - yahya94812/Semantic-Search: This project is for searching documents , notes semantically (not traditional text matching) · GitHub

It lets you find relevant notes by meaning rather than exact keywords. Here's how it works:

  • Generates embeddings for all notes using all-MiniLM-L6-v2
  • Stores them in a vector database
  • At search time, embeds the query and retrieves the most similar notes via vector similarity

Would love to hear if there are better approaches worth exploring!

1 Like

This by itself would great improvement when implemented in Joplin internal search.

For even more amazing results we need to utilize MD structure for text chunking. Each chunk should additionally have contextual information about how this chunk contributes to the upper hierarchy in scope of note, how it contributes to summary of whole note and also should contain short summary of chunks/links/images from same/other notes it references. Then these “rich” pieces of text go into Vector DB.

During the search we apply sparse+dense and also a re-ranker LLM to sort/filter based on relevance and finally regular LLM to decide the results subset that should be shown to the user.

This approach is not so demanding as existing GraphRAG or others architectures, but proved to be amazingly working for my local RAG setup.

1 Like

I think that for implementing an LLM-based reranker, we would need to use third-party LLMs through API keys. This can certainly be implemented, but I think it would be useful to keep it as an additional or optional functionality.

Embedding-based search is great for running local semantic search, especially for classic and basic users who may not want to rely on external APIs.

Your idea of implementing Markdown-conscious chunks is great, as it helps maintain the semantic meaning and metadata of blocks.

So I’m looking forward to implementing the experimental Markdown-conscious chunking feature.

1 Like