Quick questions about your notes for the GSoC auto-categorization plugin

Harsh16gupta · 23 March 2026 22:23

Yes, a pre-processing step will help a lot to clean the useless titles. I will remove anything that is clearly generic "Untitled", empty titles, "Note 1" type patterns before applying any weighting.

After filtering, I thought how to decide how much weight to give to the titles left (it can be done in 2 ways):

Word count: give more weight to longer titles and vice versa. A 6 word title gets 0.3, and it goes on decreasing. this is simple and fast but this has an issue what if the title is long and wrong(unrelated to the note body) then it will affect the cluster formed.
Cosine similarity: In this I will embed the title separately and compare it to the body_avg_vector. If they're talking about the same thing, similarity will be high and I give the title more weight. If they're mismatched, similarity drops and the weight reduces automatically. This is a better approach, but it adds one extra embedding call per note(will add 60 more seconds if there are 2000 notes only once at the start).

How this will help:
For short notes the body average is a great matric even if I don't add the title it will give good result but for longer notes where the body covers multiple topics and the average vector gets blurry(less useful as the note has covered different topics) a good title brings the final vector toward the actual main topic (following the cosine similarity).

Analysis:
On a 2000 note collection, probably 300–400 of those longer notes would benefit noticeably. The extra time is around 60 seconds on the first run, and after that everything is cached.

How should I proceed?

Topic		Replies	Views
AI Note Clustering BenchMark Tessting via Plugin GSoC	0	25	29 March 2026
Weekly Update 8: Rich Text Editor in a Panel and Crafting Summaries Summarize with AI weekly , report , gsoc-2024	0	71	22 July 2024
Remove superfluous note names Features	43	4581	10 July 2020
GSoC 2026 Proposal Draft – Idea 4: Chat with your note collection using AI GSoC	0	19	31 March 2026
I want the title of my notes to be updated automatically with the first paragraph. Features	15	1108	3 April 2024

Quick questions about your notes for the GSoC auto-categorization plugin

Related topics