Great suggestions @JamesWriterNarry, I'm very interested in the directions that agents can take us. As I mentioned earlier (thanks @davadev), due to time constraints and to the fact the next release already includes a long list of small-to-medium fixes and updates (many of them still need to be implemented), I won't be able to work on large features for the coming release, but perhaps later this year. The same goes for cool / experimental chunking methods.

In the meantime I'd like to share a pro tip that I heard recently.

Pro Tip: Better note chunking

  • One thing that can improve the discoverability of your chunks is to set a low (128-256 tokens) setting of Notes: Max tokens. The idea is that smaller chunks are more homogeneous in content and are therefore represented in more distinct embeddings. This way, search may be able to pinpoint more accurately where relevant information exists in your notebooks.
  • In order to avoid a very short context due to smaller chunks, tell Jarvis to include a few (1-2) preceding and following chunks for every chunk that it finds, using the advanced settings Notes: Number of leading blocks to add and Notes: Number of trailing blocks to add.

I'm running Xinference on 8gb and 16gb laptops, and it seems to work well. The 16gb one is currently running both mistral-instruct-v0.2 (chat) and mxbai-embed-large-v1 (embedding) and is doing alright. Mistral is supported out of the box, and mxbai I configured manually (will be happy to share the JSON file). I used the pip install command to setup Xinference, but there seem to be dockers for it as well. In fact, Mistral is so lovely (albeit a little too chatty, perhaps I should tinker with its system message) that I'm considering going completely offline (I stuck with GPT until now).

EDIT: If you are having a hard time installing Xinference, and in addition do not have a Nvidia GPU, I recommend installing docker and doing the following: (Select a /path/on/your/pc to store downloaded models)

docker pull xprobe/xinference:latest-cpu

docker run --name xinfer -d -p 9997:9997 -e XINFERENCE_HOME=/data -v /path/on/your/pc:/data xprobe/xinference:latest-cpu xinference-local -H 0.0.0.0

(For GPUs see here)

There are standard benchmarks for models such as this one. Naturally, it's best to have a benchmark that is closer to your use case (e.g., chatting over notes). If you have ideas for Jarvis benchmarks I'll be happy to collaborate, post on github or link to a webpage from the repo.

There are new recent embedding models by OpenAI, and they're supposed to be even better. I'll add them to next release, but you can start using them via the custom model setting. Still, I think that there are offline models that perform better (such as mxbai-embed-large-v1) on the benchmarks that I linked above.

1 Like