Plugin: Jarvis (AI assistant) [v0.8.5, 2024-06-04]

good comment, thanks, I'll add it.

@shikuz
I have been experimenting with the usecase of support Chatbot. Could you please add these features to allow me to better see what is happening under the hood?

Detailed preview
I know I can “preview chat Notes context” but this does not give me the whole chunk. I would like to see everything that is inside. if I click on it takes me to the note but even there I don’t know exactly where the chunk begins. I would like to also see all metadata. If i remember right different embedding models might also display different metadata so this might be important to see. I need this also to see if my notes are short enough to fit my set chunk size in the settings. I want to make sure that Jarvis gets whole context necessary.

Prompt preview
if I am not mistaken you send something like this to the LLM of choice: “System Prompt” followed by “context” filled with the relevant chunks and whatever what is filled after the Jarvis setting keyword “Context:” and “Not Context”. I would like to see in what order these are sent. I think the order is very important for the LLM to understand the text correctly.

Semantic similarity
Could you display in the side panel how much the notes are actually similar? (A number) This could help me to better understand how different formulation affects the similarity. Or how good are different embedding models.

Dynamic note similarity
I think this feature could have potentially a big impact. User would be able to set a range (for example 100 - 75) and number of desired notes (for example 3). Jarvis would than start with the highest Note similarity and go down until it finds enough similar notes (3) or until it reaches the bottom threshold (75). I think this setting reflects more of what we want. We don’t always want the highest note similarity as this might reduce the number of notes found, but we also don’t want junk notes to get in the way.

Other Chunking Methods
Have you looked into other chunking strategies? I think I saw this method where embedding model is used to decide what chunks are similar to each other and then they are put into groups to preserve context. Or the less advanced chunking that divide the text according the paragraphs.

Bigger field “Chat: System message”
This is not that important but would be a nice feature to have as I tweak this a lot.

I have also some questions regarding these settings:

Chat Memory tokens - does this actually apply to Chat with your notes? Does chat with your notes has any chat history or do I have to specifically add it to “Not context/Context” to let Jarvis remember what we wrote about?

Notes: Aggregation method for notes similarly - What exactly happens here when the notes are ranked?

Anyway as I am going down this rabbit hole I appreciate how much work you have put in this. Many other tools I tried are not as advanced in settings as what you did! Or they are more advanced but you basically have to program everything yourself…

1 Like

Thanks for all the suggestions @davadev, great to hear that you're in continuous experimentation, and striving to make Jarvis better. :slight_smile: I am still using and maintaining Jarvis, and I plan to release a small (long overdue) update soon, but I'll admit that my resources are currently limited, especially for new features. I'll try to see if I can squeeze some of these features in to the next release.

Detailed preview / Prompt preview: There is actually a bug that I noticed only recently, where the scroll function to the exact chunk in the note is dependent on the Bundle plugin being installed (it works even if you hide the panel of this plugin). Until it's fixed (next release), you may install this plugin, and get to the exact chunk by clicking on the context preview. The entire prompt, including the context, is printed to a log, but it appears that this is only available in dev mode. I can add a preview dialog that shows the entire prompt, including the system message, chat history, note chunks, and prompt. In the meantime, you may also use a self-hosted LLM server in order to view what's being sent to the AI in the server log / terminal.

Semantic similarity: I can add an option to display the similarity score.

Dynamic note similarity: In a way this is already implemented, if I understand your comment. The settings have Notes: Minimal note similarity that you can set, which is the bottom threshold. For the note similarity panel, the setting Notes: Maximal number of notes to display is exactly the number of desired notes. I can apply this setting also on chat context. For chat with your notes, the setting Chat: Memory tokens determines both the total length of the chat history to be included, and the length of the context that will be extracted from notes. The context of the chat will include X tokens from your previous conversation and X tokens from the newly extracted chunks. (This also answers your first question.) So, for example, if you set it to a low number, only the top chunks above the bottom threshold will be included. Given this, you could roughly estimate the number of chunks that will be included (on average) for a number of memory tokens.

Other chunking methods: Will be happy to read references if you have any to share. EDIT: I almost completely forgot about it, but in the Advanced Settings section you'll find 3 settings that are related to your suggestion: Notes: Number of nearest blocks to add groups X similar chunks together in the context. It starts from the top similar chunk, and then selects additional chunks that are similar to it to be bundled in the context. Notes: Number of leading blocks to add and Notes: Number of trailing blocks to add will add X chunks that appear in the same note before / after the selected chunk, to create a continuous context that extends beyond the default chunk size.

Bigger field “Chat: System message”: Unable to change that unfortunately (Joplin limits).

Notes: Aggregation method for notes similarly: This affects only the note similarity panel (ignored for notes context), and means that when sorting the notes in the panel (top similarity first) we consider either the maximally similar chunk in each note, or rather the average of the chunks in that note.

1 Like

@shikuz Thank you for taking time to explain, this makes such a difference when I actually understand what the settings do. :smile:

  • Preview Dialogue would be great! I was suspecting that some tokens are cut off and now thanks to your explanation I understand what setting was wrong. I fixed it, but preview would make this error in settings more obvious.

  • I tried to start Joplin in Dev mode and imported my notes and installed Jarvis with Bundle. But still when I open the log file in dev profile I don’t see the prompt.

  • Can you elaborate more on this:

For the note similarity panel, the setting Notes: Maximal number of notes to display is exactly the number of desired notes. I can apply this setting also on chat context

Does it mean Jarvis is getting at the moment different number of notes (chunks) than what is displayed in the panel?

  • One more wish for a feature :innocent:. I know this will differ from LLM to LLM but it would be nice to have some approximate token counter in the notes.

Edit: Regarding Chunking strategies, I watched this video.

1 Like

I seem to get the timeout error a lot when using offline model. I suspect it is because of wrong order of doing things (timeout is set big enough and it occurs right at the start) but I can’t really figure out what I am doing wrong. I have this suspicion:

I have to wait with sending request to LLM till I know Jarvis got reply to his hello request at start. If this is the case it would be great if Jarvis shown some pop up warning or prevented chat with notes till he is ready. I also suspect that changing some Jarvis settings (maybe chunk size?) might also result in Timeout but i didn’t test it enough to be sure.

Opening the plugin in dev mode requires building it from the source, I think.

Jarvis goes through all the note chunks, sorts them by similarity to the query / current note, and uses the cutoff of the minimal threshold. This may result in 100s if not 1000s of chunks, so only the top 10 notes are displayed by default. If you're asking about what is sent to the LLM, then it is exactly what is displayed in the preview panel.

Sounds doable. Added to the FR list.

And thanks for the video!

1 Like

Sounds a little strange, that a response to hello would take so long, assuming that the local model is already up. I'll run some tests, and keep your suggestion in mind. In any case, your suspicion that Jarvis reloads the model when certain settings are set is correct.

1 Like

@shikuz I was just watching some videos about AI agents. Have you thought about implementing agents in Jarvis? If so, what use cases did you have in mind?

2 Likes

Agents would be a cool feature! But I think it is too big for @shikuz, as he mentioned that his time is more or less limited to maintaining Jarvis. Perhaps the simplest implementation would be to have a limited number of prompts that run sequentially over the same text.

Something like this:
Prompt 1 - "Correct grammatical errors".
Prompt 2 - "Break the text into paragraphs and subheadings".
Prompt 3 - "Create a summary“

If I am not mistaken, something like this is almost possible, as you can save prompts in Jarvis that you want to use for text editing. It would just need some automation to allow you to select prompts to run in sequence over the same text...

2 Likes

Well, I was thinking about something more complex in the context of "Chat with your notes".
Agents could abstract the Jarvis "command block".

The user would ask a question and the first agent would decide what to parse into the "context". If you start "chat with your notes" in an empty note, you will get a very good selection of notes related to your first question. However, for each subsequent question, the results get worse as it does a semantic search on the whole note. You can partially fix this by always adding your last question to "Context"; that way you'll only get notes related to the last question, which is mostly fine if you don't need to take the context of the conversation into account.

I think the optimal solution would be for an agent to customise the user's last question to include some keywords or some kind of summary from the chat history that would give the semantic search relevant contextual information, but only relevant to that question. In this way, the semantic search query is improved to give the most relevant results.

The next level could then be an agent that reasoned step by step what information it needed to answer the user's question, then performed multiple semantic searches with a summary of results for each query, and finally combined the summaries into one answer.

1 Like

This is indeed an intriguing idea. I think this could solve some of the problems I face when trying to build my support chatbot! I need the chatbot to have enough contextual information (for example, what troubleshooting steps have already been taken, or what device the user is talking about...).

Adding contextual information to the query could perhaps eliminate the need to include chat history, as the context would already be included in the user's query. In this way, by treating each question as a new chat query, we could save some valuable tokens for more note chunks! (Or make using offline models with smaller context window more useful!)

I have tried to install Xinference, but unfortunately I can't get it to work. I managed to get rid of all the python errors, but still the process gets killed while building the wheel. A quick search on the internet suggests that hardware resources (RAM) are to blame. It seems to be much easier to run LLMs locally than to run embedding models locally.

For my private projects I stick to the English language as it is currently the most feasible way to run something locally with few resources. I stumbled across this LLM " phi 2 Q5_K_M" which seems to be the smallest english RAG capable model and am curious if anyone can evaluate how well it works for their usecase.

The default Jarvis offline embedding model has its limitations, and for my latest project I tried it didn't manage to get the right related notes, while OpenAi text-embedding-ada-002 with the same context for semantic search gets exactly the notes I need. On the plus side, I now understand the critical role of the embedding model. Maybe I can get the same result as with OpenAi ada by reformatting my notes or changing the context .

@shikuz I am thinking about how to benchmark the different offline configurations. Maybe we could add some sample notes to the repo that everyone could use to test different models (LLM/embeddings) so that we can either find out which models work best or what formatting one has to have so that even not-so-good embeddings models manage to find the relevant information. But I am currently only assuming that my problem lies in the note formatting, as to my knowledge it could also be an embeddings model lacking data in the domain of my use...

Maybe you could use Ollama to run a different embedding model. Has anyone managed to make this work with Jarvis?

„nomic-embed-text“

(Apparently I can not insert links)

This could be an option. I just checked and there are some new text feature extraction models on huggingface in gguf. Practically they can be run in any of these LM Studio, Ollama, Jan. Optimal would be to have the LLM and Embedding model run with the same tool to save resources. I am only aware of ollama being able to run two models at once, but the question is whether this works with the provided workaround with Litellm.

Great suggestions @JamesWriterNarry, I'm very interested in the directions that agents can take us. As I mentioned earlier (thanks @davadev), due to time constraints and to the fact the next release already includes a long list of small-to-medium fixes and updates (many of them still need to be implemented), I won't be able to work on large features for the coming release, but perhaps later this year. The same goes for cool / experimental chunking methods.

In the meantime I'd like to share a pro tip that I heard recently.

Pro Tip: Better note chunking

  • One thing that can improve the discoverability of your chunks is to set a low (128-256 tokens) setting of Notes: Max tokens. The idea is that smaller chunks are more homogeneous in content and are therefore represented in more distinct embeddings. This way, search may be able to pinpoint more accurately where relevant information exists in your notebooks.
  • In order to avoid a very short context due to smaller chunks, tell Jarvis to include a few (1-2) preceding and following chunks for every chunk that it finds, using the advanced settings Notes: Number of leading blocks to add and Notes: Number of trailing blocks to add.

I'm running Xinference on 8gb and 16gb laptops, and it seems to work well. The 16gb one is currently running both mistral-instruct-v0.2 (chat) and mxbai-embed-large-v1 (embedding) and is doing alright. Mistral is supported out of the box, and mxbai I configured manually (will be happy to share the JSON file). I used the pip install command to setup Xinference, but there seem to be dockers for it as well. In fact, Mistral is so lovely (albeit a little too chatty, perhaps I should tinker with its system message) that I'm considering going completely offline (I stuck with GPT until now).

EDIT: If you are having a hard time installing Xinference, and in addition do not have a Nvidia GPU, I recommend installing docker and doing the following: (Select a /path/on/your/pc to store downloaded models)

docker pull xprobe/xinference:latest-cpu

docker run --name xinfer -d -p 9997:9997 -e XINFERENCE_HOME=/data -v /path/on/your/pc:/data xprobe/xinference:latest-cpu xinference-local -H 0.0.0.0

(For GPUs see here)

There are standard benchmarks for models such as this one. Naturally, it's best to have a benchmark that is closer to your use case (e.g., chatting over notes). If you have ideas for Jarvis benchmarks I'll be happy to collaborate, post on github or link to a webpage from the repo.

There are new recent embedding models by OpenAI, and they're supposed to be even better. I'll add them to next release, but you can start using them via the custom model setting. Still, I think that there are offline models that perform better (such as mxbai-embed-large-v1) on the benchmarks that I linked above.

1 Like

@shikuz Thanks for the chunking tip. This actually makes sense. Is there a way to add not only nearby chunks, but perhaps all chunks in the same note?

Regarding chunking, I would like to share a few thoughts from my experiments.

  1. Depending on your use case, you may face the problem that you need certain notes as a whole (e.g. a checklist or other content where integrity is very critical), for this use case it might actually be practical to have large chunks and then manually ensure that your notes are not longer than the chunk size. (That's why I'm grateful you're considering the approximate token count feature in notes)

EDIT: Ok I think this above is not accurate as I realised that even with a large chunk size I get small chunks due to frequent use of subheadings. To have everything in one chunk would require me to avoid further subheadings, I guess...

  1. Another solution to the lack of context caused by chunking might be if Jarvis supports RAPTOR RAG in the future.
  2. Another workaround I experimented with is providing metadata through subheadings. As I remember you once mentioned that the embedding model stores the subheadings that lead to the chunks, I use H1 subheading to include a keyword that is important for the context of each chunk and that the chunks themselves may not contain. I have yet to try how tags affect this behaviour, as I am a bit worried that they might not be as "noticeable" to the LLM as H1 subheadings. I think the use of metadata will be very important to further sort out notes that are semantically similar but do not fulfil a specific requirement. My question is whether I can achieve this selection of notes containing only some tags by using the "Search" command in the Jarvis block. (Once again, thanks for the decision to have a preview window, as this will help me with further refinement.)
  3. If you are like me, your notes are not the same in all your notebooks. So it makes sense to have different chunk settings for each notebook. Would it be possible in the future to change the chunks settings on a notebook by notebook basis? By the way, this might be a bit similar to my earlier request to have different profiles in Jarvis. I just wanted to let you know that I have found a workable workaround for the profiles by simply using a different Joplin profile and importing the notes I want to experiment with. For testing this is great, but for day to day use it would be nice to be able to change the chunk settings on a notebook basis as I often change these depending on my use case and it means rebuilding the database each time.

I've read joplin-plugin-jarvis/GUIDE.md at master · alondmnt/joplin-plugin-jarvis · GitHub
I run Ollama and OpenHui on my computer, so I have a ollama accessible at http://host.docker.internal:11434 or by another terminal app

I run Lm Studio too

But I don't know how to enter the needed values on Jarvis configuration

The settings in the guide are all available in the Jarvis plugin settings. I agree that sometimes it is a bit difficult to find them, but

  • "Model: OpenAI API Key"
  • "Chat: Model"
  • ...
    are all there, and only a few are hidden in the advanced options.

Can you be a bit more specific? Do you have problems with all the settings or just some?

PS: A tip for easier navigation the prefix like „chat“, „model“ … is really important as there are many settings that look quite similar :wink:

v0.8.0

It's been a while.

download file

new features

  • revamped Edit with Jarvis command (@jakubjezek001) (screenshot below)
  • Auto-complete with Jarvis command to autocomplete any text at the current cursor position
  • scroll to line of a found note chunk from the panel
  • chat context preview dialog (screenshot below)
  • token counter command
  • display note similarity score in panel

new models

  • OpenAI
    • replace text-davinci (deprecated) models with gpt-3.5-turbo-instruct (@jakubjezek001)
    • 3rd generation embedding / notes models text-embedding-3-small and text-embedding-3-large
    • chat model gpt-4-turbo: an efficient, strong model with a context window of 128K tokens
  • Google AI
    • deprecated PaLM
    • chat models gemini-1-pro and gemini-1.5-pro (a strong model with a context window of 1M tokens!)
    • embedding / notes models embedding-001 and text-embedding-004

new settings

  • Notes: Context tokens: the number of context tokens to extract from notes in "Chat with your notes" (previously used Chat: Memory tokens)
  • Notes: Context history: the number of user prompts to base notes context on for "Chat with your notes"
  • Notes: Custom prompt: the prompt (or additional instructions) to use for generating "Chat with your notes" responses
  • Notes: Parallel jobs: the number of parallel jobs to use for calculating text embeddings

chat improvements

  • chat display format (screenshot below)
  • chat with notes default prompt
  • chat parsing

general improvements

  • CodeMirror 6 / beta editor support
  • load USE from cache instead of re-downloading every time
  • faster model test on startup / model switch
  • various fixes

ux

  • new standard dialog style

Thanks @davadev for the help testing this release.

Screenshot 1: New edit dialog
image

Screenshot 2: New chat context preview
image

Screenshot 3: New chat display format
image

2 Likes

Jarvis is awesome, thanks!

Apparently PaLM keys work with Gemini, so I picked the 1.5 model even though I don't have a paid account, and it still works :thinking: Don't know if they just silently fall back to the 1.0 model or maybe they decided on a grace period, the free preview access to 1.5 was supposed to end May 1st.

1 Like