1. Progress
UI/UX
In week 6, I have decided to fully focus on the UI/UX aspect of the AI summarisation feature. If you have any feedback or inputs, feel free to share!!
1.2 Created note summarization dialog
When users right-click on notes, the new dialog pops out. This new feature allows users to control the length, the choice of algorithms, and the content of summaries as shown in the video.
1.3 Created notebook summarization dialog
When users right-click on notebook, the new dialog pops out. This new feature allows users to control the length, the choice of algorithms, and number of notes to be summarised.
Users have option to:
- Only summarise notes which selected notebook has (only immediate children)
- Summarise all notes in a selected notebook
(in the future: have an option to allow users to pick which notes to summarise in a notebook)
In the upcoming weeks, I will implement the feature where users can click on the individual notes that they want to summarise and also able to edit them one by one if they want to.
Ideas:
-> project-tree like structure to visualise notes and notebooks in notebook dialog
-> or folder-like structure to visualise notes and notebooks in notebook dialog
AI
1.4 Dimensionality reduction
In the weekly updates post, few users (@BioFacLay and @Imperial_Squid) recommended to use dimensionality reduction for word2vec since it will improve the clustering performance of high-dimensional data.
There are several dimensionality techniques:
- UMAP (umap-js - npm)
- tSNE (tsne-js - npm)
- PCA (GitHub - mljs/pca: Principal component analysis)
It is good to understand each of the dimensionality reduction techniques and their weaknesses.
1.5 More clustering algorithms
Furthermore, I've been also recommended to explore more clustering algorithms (especially density-based and hierarchical):
- HDBSCAN
- Mean Shift (used in computer vision, might not be good for textual data)
1.6 Topic Modelling
A technique to uncover hidden thematic structures in a large collection of documents.
-> it basically helps in identifying and categorizing main topics in text (e.g., LDA). Might be useful?
Sources:
Universal Web Worker in Joplin
In the past, there has been a problem with running Tesseract.js with .wasm instances in a plugin. The way it has been resolved is that it is integrated to the main Joplin app as a web worker:
There has been a problem since the Bonding period of me running Transformers.js and ONNX in a plugin. They both contain .wasm instances. Therefore, the approach of creating another web workers in the main app might work.
Evaluation research / survey
This is still in discussion with @Daeraxa. We will come into a conclusion soon and upload the survey in Week 7.
Others
- added logs with
electron-log
- if we are using unsupervised methods for extractive summarization, there is no need for creating UI pop windows to notify users about the summarization processes. Those might be useful for abstractive summarization and summating multiple notes or notebooks.
2. Plans
Firstly
- creating another dialog when right-clicking on the 'Summarise the note' in EditorContextMenu with selected text in the editor.
- in notebook dialog, visualize all the notes and notebooks where users can click on them and edit
- check how long it takes to summarise all the notes in a notebook and store it in JSON file -> might be useful to implement UI pop window to notify users about the summarization process
- improving styles for dialogs
- update [PENDING from week 6] GitHub README -> and then release the plugin
- finish and submit midterm evaluation
- uploading the survey for evaluation research
Afterwards
- looking into creating a minimal web worker in Joplin to use Transformers.js, ONNX, etc.
- figure out whether we can generalise 'web worker' flow for other contributors
- it seems that Tesseract.js has built-in web worker. Therefore, we have to create our own 'web worker' flow
- looking into dimensionality reduction and HDBSCAN and other clustering algorithms to understand the theory
3. Problems
- [RESOLVED] Problem of making dialog to communicate with the plugin since there are not onMessage and postMessage events for Joplin plugin dialogs. Other users have came across with this problem before: Dialog <-> Plugin Communication
- Solution: pre-populate summaries in JSON file and create batch predictions to improve the performance