GSoC 2026 Proposal Draft – Idea 2: AI generated note graph – naveensaini

Links

1. Introduction

I have completed my computer science engineering degree 2022 and got myself enrolled for Masters in Political science program in 2024 after a brief stint in a startup related to machine learning projects. I had then contributed in deploying chatbot that works on NLP pipeline to generate replies from internal corporate data. My major experience is with python for ML part, nodejs for backend application and react for front end. I will be completing my masters shortly and would like to contribute to open source in this period before taking up job again.

My motivation for this project and Joplin

I have tried multiple apps like Notion, Evernote, Obsidian to name a few for creating my private knowledge space, however I repeatedly discontinued their use as I felt I wasn't the real owner of my data. Further with the introduction of AI, all major note making apps are using our personal data which we may never want to be subjected to an AI use case.

Joplin as an open source note making app provides a secure platform for creating a private knowledge space. Therefore I would like to be part of this wonderful organisation to contribute my efforts towards creating an efficient as well as secure note making app.

2. Project Summary

The goal of this project is to help the user organise his notes and visualise the dependencies between them using a graph.

The AI would analyse all the notes in a notebook or in sub-notebooks, categorise them, and discover how they are connected to each others.

Explain the project:

  • What problem it solves : auto-generated knowledge graph from the notes to visualise scattered data

  • Why it matters to users : With increase in notes collection, it increasingly becomes difficult to refer to earlier created notes or understand the relation between existing notes. It may also lead to redundant entry of similar information. In such a scenario, if we can visualise our notes in a graph with nodes ,it will become convenient to make sense of existing notes and further store new information as per our requirement.

  • What will be implemented : Since Joplin is available on desktop as well as mobile platforms, we should be focusing on a lightweight implementation of this feature and we may also want this feature to be scalable for future use cases. Therefore, we can start our development with an NLP enabled graph generation by identifying key concept note and derivation of its relation with other notes . Later, we can integrate local or cloud LLM for refining graph feature to be on par with industry standards.

  • Expected outcome : Knowledge graph consisting of nodes representing entities/concepts extracted from user’s notes and relationship between them.

  • **I believe I have rather expanded the scope of project from what was described in the project idea list. Earlier it just mentioned linking of the notes, whereas in this proposal I have worked upon creating a knowledge graph from existing data and updation of graph based on changes made in notes.

    However this approach will enable us to improve search feature of joplin app as knowledge graph is the first step in implementation of GraphRAG used widely for generating answers from user data without hallucination.**

3. Technical Approach

Describe how you plan to implement the project:

  • Architecture or components involved : We create a pipeline where user has an option to supply AI tools of his choice for following four tasks:

    • Entity extraction
    • Deduplication of entities
    • Deriving relationship between entities
    • Type of Knowledge graph representation

  • Changes to the Joplin codebase : We can start development for a plugin which provides user an option for opting out of AI features.

  • Libraries or technologies you plan to use : To implement this project we can use following open source tools :

    • Concept Extraction from a note : We can use entity recognition libraries like SPaCy, keyBERT and use these concepts as a node in our graph

    • Establishing relationship between notes: we can use sentence transformers to create embeddings and store them in vector databases like FAISS, Chroma. These embeddings derive relationships between nodes using semantic similarity or concept co-occurence.

    • Graph databases can be used to store these results and use them for visualisation of data. For example, Neo4J for storage and Neo4j Bloom for visualisation.

    • We may also integrate a local or cloud LLM like Ollama or LLama.cpp as per our use case.

  • Potential challenges :

    • Quality vs Cost factor in entity extraction : for extracting entities which will act as nodes in the graph, LLM are considered a better but expensive option, as a substitute we can use NLP extraction using Spacy or KeyBERT.
    • Updation of knowledge graph after a change in notes data
    • Storing entities and relationship data efficiently

4. Implementation Plan

Week 1-2:

Build project as modular pipeline where user has option to supply api token or use offline model. Implement Pipeline APIs where users are able to select service as per their choice :

  1. Entity extractor : NLP (Spacy and Keybert), LLM (local or cloud)

  2. De-duplication of entities : NLP (FAISS) or LLM (local or cloud)

  3. Relationship derivation : NLP (Coreference), LLM

  4. Graph visualisation : Pyvis or Neo4j Bloom

Week 3-4:

Integrating and testing pipeline to create a knowledge graph from existing data as shown in image below

Week 5-6:

Implement architecture for updating knowledge graph with changes in data, we may use following architecture:

In this architecture, every time a new note is added or existing one is modified, changes are to fed into BERT processor and which then extracts concepts. These concepts are then fed into Graph for propagation and modifying nodes in the graph.

Week 7-8:

Creating Plugin UI where users can view knowledge graph extracted from the notes using Pyvis and networkx library

Week 7-8:

Adding functionalities to the graph view, where user gets option to interact with notes:

  1. Hovering : provides source notes from which entity has been derived

  2. Modification: user gets option to reshuffle the links and relationship between entities

Week 9-10:

User Documentation and Testing of the features developed during the program

Week 11-12:

Lay down foundation for implementing shared infrastructure of AI services where Graph developed would help as first step in Graph RAG.

5. Deliverables

What will exist at the end of the project?

  • Concept Extraction/ Summarisation of individual note entry

  • Establishing relationship between such concepts

  • Storing this data into a readily available form

  • Create an UI for visualisation and interaction with this data

  • Tests

  • Documentation of the plugin and for further development into graph RAG

6. Availability

  • Weekly availability during GSoC : will be available for 6-7 hours everyday on weekdays and 4-5 hours on weekends if required

  • Time zone: I am currently residing in Indian Standard Time zone and

  • Any other commitments during the programme: I dont have any professional obligations during the program, but will be having few academic commitments in between arising from enrollment in masters course in political science .

1 Like

A small proof of concept is as follows:

for knowledge graph with change in notes, I am analysing following papers:

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models : https://arxiv.org/pdf/2502.09956

[1808.09040] One-Shot Relational Learning for Knowledge Graphs [1808.09040] One-Shot Relational Learning for Knowledge Graphs

[2308.07134] Language is All a Graph Needs GitHub - agiresearch/InstructGLM: Language is All a Graph Needs · GitHub [2308.07134] Language is All a Graph Needs

Huguet Cabot, Pere, and Roberto Navigli. "REBEL: Relation Extraction by End-to-End Language Generation." Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 7829–7844.

Hi! Are you sure that there’s working spaCy in Javascript? Your POC is in Python.