Implementing graph generation using AI ; GSoC Idea no. 02

Hello everyone,

I wish to work upon implementing the graph generation for the notes using AI as mentioned on GSoC ideas page.

I have created a draft proposal file, Project proposal - Google Docs

To implement this idea, I propose following steps :

  1. Concept Extraction/ Summarisation of individual note entry

  2. Establishing relationship between such concepts

  3. Storing this data into a readily available form, for ex. Graph database

  4. Create an UI for visualisation and interaction with this data

For implementing these steps, I propose :

  • Concept Extraction from a note : We can use entity recognition libraries like SPaCy, keyBERT and use these concepts as a node in our graph

  • Establishing relationship between notes: we can use sentence transformers to create embeddings and store them in vector databases like FAISS, Chroma. These embeddings derive relationships between nodes using semantic similarity or concept co-occurence.

  • Graph database can be used to store these results and use them for visualisation of data. For example, Neo4J for storage and Neo4j Bloom for visualisation.

  • We may also integrate a local or cloud LLM like Ollama or LLama.cpp as per our use case.

I would like to hear from mentors and fellow users, if this proposal covers the requirement specified in the ideas page.

Also I would like to add one more step in this idea, where AI can be used to cluster related notes in a group before generating graphs of the notes. It will help when a collection of notes is on diverse topics without segregation using subfolders.

A small demonstration in python notebook can be tested here :

Thanks

1 Like

Hi @naveensaini,

Thanks for sharing the proposal. The overall approach looks interesting, especially the idea of extracting concepts from notes and building relationships between them.

One thing I’m curious about is how this would integrate with Joplin’s current architecture. Since notes, search, and metadata are already handled in the core library services, it might be worth exploring if relationship generation could be built on top of that instead of introducing a separate graph database.

file-structure.txt (39.4 KB)

The clustering idea also sounds useful, especially for users with many notes across different topics.

A small prototype showing concept extraction and a simple graph from a few notes could help demonstrate how this might work in practice.

Hello Kaushalendra-Marcus,

Thanks for the reply. I will repurpose my proposal by taking into account of existing architecture of Joplin, as I also believe that separate graph database at this stage will not have much benefit considering the complications it will introduce.

I had linked a small demonstration in my earlier post which uses Spacy and NetworkX to create graph and Pyvis for visualising the graph.

https://colab.research.google.com/drive/18bAMWqUN_JDAZc99h4wUMhThqp3JA_Cp?usp=sharing

I will improve upon it by using key phrases rather than entity extraction to derive relationships between notes in the graph.

@naveensaini, also remember that you need to create a few pull requests first, as we need this to evaluate your proposal later on

First of all please scratch some visual/schematic demo because it’s hard to comprehend how you see the result:

  • is it note being linked to other notes visually because they share one of many concepts?
  • is it note’s heading/anchor (section) linked to other note’s heading because of shared concept?

This is a very-very ambitious project the way I see the idea.

And to my understanding usage of such tool is going to be really expensive in terms of LLM tokens.

I once tried setting up GraphRAG for Joplin and it was the most expensive approach out of all I seen at that moment. I don’t say GraphRAG is what is needed here 100%because it’s for RAG obviously but a lot of the concepts and prompts and algos it employs should be very much similar here for graph visualization.

Don’t take my word for it, just check what it takes algorithmically speaking for GraphRAG to parse and link all the entities and relations for couple of average sized notes. Then understand how much it’s going to cost in terms of, let’s say, OpenAI’s gpt-4o model. Then multiply that by whatever you assume is the average number of notes folks have.

That’s going to be a number $$ (not a pleasant one I assume) you’ll be basically asking a user to pay upfront after configuring LLM api key. Not even talking about how much developer going to spend out of their own pocket because it’s almost impossible to find free cloud LLMs intelligent enough for this task.

I would love to see the idea succeed, but I’m a nerd. Not sure how majority of users going to like it given the price and niche functionality.

hello laurent,
I am aware of this and actively searching for an issue that matches my skill set. By this weekend (14.03.2026), I expect to create a substantial contribution to Joplin Repo.
Thanks.

Thanks @executed,
I sincerely appreciate your feedback which provides a good reality check on the objective and deliverables of this project. In the first iteration of this project, I am proposing to use open source models and python libraries to create graph of the notes. This stage will act as a solid base for working on user Interface. Once we complete such integration with the Joplin code base, we may use LLM at the later stage to refine our relationship extraction. In this way, users will have an option for opting LLM if that suits their use case.

That being said, I will conduct a detailed study of your suggestion and will get back to you tonight (Indian Standard time).

Thanks again.

1 Like

hello all,

I researched more about graphRAG, as per my understanding of it:

GraphRAG consists of 3 steps:

  1. Use Knowledge graph for retrieval
  2. Augment retrieved information with user query and other information
  3. Generate answer for user query using LLM

In this project scope, we may have to limit ourselves to generate knowledge graph.

for generating knowledge graph, we can use keyBERT to extract entities and relations as i have done small poc. I took 03 text file of 500 words each of related topic, and I could create graphs attached here:

Further I read this article https://neo4j.com/blog/news/graphrag-python-package/, which provides a schematic of pipeline to create knowledge graph as follows

LLM can be used at Entity & Relation extractor for refined results, however at initial stage we can utilise vector embeddings with libraries like spacy and keybert.