Plugin: Jarvis (AI assistant) [v0.8.5, 2024-06-04]

Hello. I am a new Joplin user, and I am about to start using it with the Jarvis plugin. I created a new Open Api key then entered it into the Jarvis settings. I started receiving an error pop-up that is now hindering me from accessing Joplin. I keep clicking 'Ok' and even 'Cancel', but the same error message pops up preventing me access to Joplin. What is the best course of action on preventing this issue. My OpenAi api key is from a new alternative account so I should not have any errors like this one unless I need a paid subscription.

(In plugin: Jarvis)

Error: Rate limit reached for text-embedding-ada-002 in organization org- on requests per min (RPM): Limit 3, Used 3, Requested 1. Please try again in 20s. Visit :// to learn more. You can increase your rate limit by adding a payment method to your account at ://
Press OK to retry.

1 Like

I fixed it. I simply created a new key, copied it, and pasted in the entry, then deleted the former key. I went back over all of the selections and reset it at default. The error went away. I can't say what the problem was except maybe a character space at the end of the key may be the culprit.

1 Like

To be honest @davadev, I haven't been able to run the GPT4All docker since I moved to Apple Silicon exclusively (apparently it's an issue). I was able to run models and test the API on an Intel machine, but I don't have access to it anymore.

Unfortunately, I was also unable to add native support to enable local hosting of LLM models in Jarvis (despite numerous attempts) due to technical issues, so I can't support Mini Orca (or other models) directly.

But I do have good news.

LM Studio was recently brought to my attention. I was a little hesitant at first, because this looks like a free commercial product and closed source. But it's definitely user-friendly. They have a GPT4All-like GUI, but in addition to a chat window, they also have a server window that let's you host your local LLM. For me it worked very well. I updated the Jarvis guide with setup instructions.

EDIT: I realized it's been a year since Jarvis' initial release, so :birthday:


I just tried it. The setup is super Easy. I even found out how to install the mini ORCA. (One just have to enter the URL of the GGUF file Orca_mini_3B)

The only problem I have now is that the input length exceeds the context length. (Error in LMSTUDIO) I tried to change the max Tokens (2048) and Memory Tokens(512) in Jarvis Settings, but I still get about 7000 (Tokens?) on the server side.

PS1: Is amazing how far Jarvis had come in this year!
PS2: I have few Ideas how to improve the performance of Jarvis even if weaker model is used, but firstly I have to get the offline model running. I was inspired by this video GPT4 playing Minecraft I think that one could use carefully crafted system prompt and Context prompt in Jarvis to Improve model performance as they did in the video when they taught the GPT-4 to play mimecraft only by carefully crafting prompts without training the underlying model on minecraft data...I think this could be the way how to get good enough results even with smaller offline models...

OK I adjusted the context length to 8000 in LMstudio and it works now. Can it be that the lenght was in characters and not in tokens?

I see on server side that Jarvis attaches to user prompt this instruction. "Respond to the user's prompt above. The following are the user's own notes. You you may refer to the content of any of the notes, and extend it, but only when it is relevant to the prompt. Always cite the [note number] of each note that you use." I think it is a great prompt however I would like to tweak it little bit. Could you make it possible to adjust this prompt? I haven't seen it anywhere in Jarvis Settings.

This is awesome, thanks for this. Could you add CodePal.AI code generator to the extension?

Glad that you're making progress with offline prompting. Let us know how the tweaking goes. For example, I'll add a recommendation to increase the context length in LM Studio to the guide.

You're right, this prompt is not exposed in the settings currently, but there's no reason not to add it to the advanced section. I'll add it in the next version.

The length at least from the Jarvis side is in tokens. It's possible that token counting is a little different in Jarvis and on the server, depending on the model, but I expect it to be very similar. I'll try to have a look too, maybe I missed something (although with other models this was not a problem).

1 Like

I'll have a look, thanks.

Jarvis is becoming very complex with all the possible settings and offline Models. Have you considered extend the documentation to include best practices/tips (recommended settings)? I know writing and keeping documentation up to date is a lot of work, but perhaps Jarvis community could help out there. I keep returning to this forum to check some tips as there are a lot information in this thread but it is becoming increasingly difficult as this this tread is growing.

So far I can report these findings from my offline model tests.

  1. Low end PCS (like mine) with about 8GB Ram will work with about 3B models like ORCA mini, but the Token Generation is too slow. I tried smaller 1B model TinyLlama and this seems to be offering a Performance that is acceptable. Still the Timeout in Jarvis Settings has to be increased to make this work. (I changed the 60 seconds to 300) I think for weak hardware it will be crucial to find the best 1to3B parameter model.
  2. Searching on Hugging Face for models is still not easy experience. If I apply filters I can't use search/full text search at the same time. This makes it difficult to find right model in GGUF format for LMSTUDIO. I think it would be beneficial if Jarvis community kept documentation about tried Models to make it for new comers easier.
  3. Server log window in LMSTUDIO seems to be very helpful in having overview of what was sent to the LLM thus making it easier to check if some settings must be tweaked or if the used Model does not deliver desired results.
  4. I am not sure if this is a bug or my PC is simply too slow, but sometimes when I start Joplin the Jarvis options won't be available at first. I guess this is due loading the Google semantic search model. I am also not sure if Jarvis attempts to load the main model (in LMSTUDIO) first as I think I could not get semantic search working when I had offline custom model set but the LMSTUDIO server was off.
  5. Did I see it correctly that you test Server connection to LLM by sending "Hello world"? Some models (Hello ORCA I am looking at you) have the need to reply with longer messages. As bigger the model is as slower the initial loading of Jarvis will be, due the long message reply and slow token generation.
  6. As I have LMSTUDIO server Window open to see what is happening on the LLM side, sometimes I notice the reply is nonsense and I want to cancel it. How can I go about it? It seems Jarvis doesn't have option to cancel the generation. If I close joplin the LLM will still attempt to finish the generation. My impression is that even restarting the server won't stop it. Sometimes I solved it by closing everything and opening it again, but as everything loads up relatively slow this is quite time consuming.
  7. Would it be possible to speed up loading of the Google Semantic search by making it completely offline? Or perhaps you already have an alternative that does not need internet connection to load up.
  8. In my tests I had an issue when I tried chat with Notes. Even though I had about 10 related notes in side panel Jarvis was sending only the first one to LLM. I suspect this might be related to the chunk size settings. I observed that the note it was picking was another test Conversation with Jarvis that had about the same length question. In another instances the text sent to LLM was longer but I have still the feeling that Jarvis tends to only use one (the first note). Maybe the 1B model has too small context size and cuts off all other related notes... That is why I think it would be beneficial to have more documentation on this as getting the settings right on both sides (LMSTUDIO and Jarvis) is quite time consuming process.
  9. As I am new to this perhaps my question is no-brainer but I am not quite sure how to verify this. Jarvis sends the instructions to LLM in Openai API format, or? If I understood it correctly, not all Hugging Face Models will follow this and I guess this could cause compatibility issue or at least some weird results in text generation. I think this might be the case for the 1B model I tried and linked above. How can I verify that what Jarvis sends is correctly understood by the LLM? (I am talking only about the instructions). I was wondering if I should ask you for a feature to adjust this API settings, but as I might get what I wish :smile: I am reluctant to ask for it as I am still not sure I understand it enough to be able to play with this setting.
  10. As the settings are getting really complex, I was wondering if there is anything beside documentation what could be done to simplify this. Like saving Settings in different profiles. Switching Models requires also to switch other settings and that makes going back and forward more difficult. (I tend to forget what all settings I have to check)

I know these are quite many points I wrote, but I hope it could spare you @shikuz some time on the testing.


Thanks for all your impressions and tips @davadev, this is great.

I agree that the readme and guide are not enough to navigate through all the options, and I'd be happy to accept community contributions. Perhaps a github wiki with tips / tutorials for various features / models can be a good addition to the project?

At the moment, I'm short on time, so I can only maintain and support the plugin, but can't add new features.

1-2. I agree that keeping a curated list of recommended models could be nice. See my suggestion above to start a wiki. BTW, the guide currently recommends a 600 sec timeout for offline models.

  1. This could be to a combination of: Short designed 5-sec delay before Jarvis starts (to let other Joplin components finish loading first) + load / test of the embedding model + load / test of the chat model.
  1. The models I'm used to usually generate short responses (or perhaps they do it quickly enough that I didn't pay attention to it), but it may be a good idea to change to test prompt to something simpler like "Just say 'yes'", or something similar.
  1. I don't have a solution for this.
  1. As far as I understand the USE model connects to the server every time Jarvis starts, but doesn't re-download the model and instead loads a locally cached model. I don't think that I have control over this behavior, but I'll have another look.
  1. By "10 related notes" do you mean that you used the command "Preview chat notes context"? Because this should be identical to what is sent to the model. Hypothesis: your Notes: Max tokens are 512 and your Chat: Memory tokens are 512. So if the note chunks that semantic search found are close to 512 tokens (max size), or even larger than 256, there will be room for just a single chunk. Does this make sense?
  1. I'm under the impression that it is LM Studio's job to build an OpenAI compatible API (the de facto standard for many applications) and translate the messages to prompts adapted to any model. So indeed, what Jarvis does is just make OpenAI API calls. I don't plan to expose more settings at this stage.
  1. I completely understand, and it's a good idea generally, but I'm afraid I won't have the time to implement this soon (PRs are welcome). Hopefully, improving the documentation will be enough to make the settings more accessible.
1 Like

On MacOS I got a combination of ollama and LiteLLM to work with Jarvis. So far I have only tested the ollama/orca-mini model but it should work the same for others.

I installed ollama (using the download from their homepage) and LiteLLM (pipx install 'litellm[proxy]')

Then I ran:

ollama run orca-mini

With ollama running in another terminal I ran

 litellm --model ollama/orca-mini --drop_params

--drop_params is important since "ollama does not support parameters: {'presence_penalty': 0}" and you'll get an immediate timeout error from Jarvis.

The settings in Jarvis are:

Chat: OpenAI (or compatible) custom model ID: ollama/orca-mini
Chat: Custom model is a conversation model: true

Chat: Custom model API endpoint:

I entered a dummy OpenAI API key (any value works)

Hope this helps!


Also got it to run with Orca2. Just replace all instances of orca-mini with orca2


This is very cool, thanks for sharing @danielw2904. Would you like to add it to the guide? (If you won't I probably will at some point)

1 Like

Sure, I’ll create a PR.

1 Like

Just that you know Ollama is now also available for windows. :partying_face:

I will try running Jarvis with ollama and litellm and see how it goes. Mistral 7b seems to have good performance in terms of capabilities and despite its size, Ollama somehow manages to run it on my PC! Gemma 2b also seems to work well and is super fast, but I think this model will need more tweaking in the system prompt to make it work with Notes.

@shikuz Any luck testing other local embedding models? I tried Ollama with AnythingLLM and AnythingLLM has a list of supported LLMs including tools that allow to run local embedding models. One of them is From the GitHub repository it does not seem to have a windows installer, but maybe it could work on docker in WSL... Maybe you could check it to get some inspiration for Jarvis.

I also saw this embedding model on huggingface:
But I'm struggling to find a way to run it locally as it doesn't have a gguf file. And I want to avoid having to make it work with python. :sweat_smile:

1 Like

Thanks for the update @davadev. I'd like to mention another open-source local server that was brought to my attention this week (thanks @Willxiam!) called jan. Apparently it is also available for Windows.

It's also long overdue to update that thanks to recent contributions The Guide already has instructions for setting up Ollama (thanks @danielw2904!) and setting up offline embeddings with Xinference (thanks @Hegghammer!). The latter supports a number of embedding models out of the box, including multilingual-e5-large (it worked with Jarvis). There is also the option to add custom models (I believe it supports ggmlv3, ggufv2 and pytorch, which may work with distiluse, but haven't tried it myself). Xinference has a docker.

1 Like

Hello Shikuz,

Thank you for your effort. But can't you find a free solution to leverage artificial intelligence? For example, prefer Google Gemini or Microsoft Copilot. Because the dollar is very valuable in Turkey, and your plugin, no matter how functional it is, doesn't make much sense in countries with similar situations.

To answer your question in general, check out the Jarvis guide and try setting up a custom model. Most of them are free and offline. Pairing them with Jarvis - a free plugin - makes much sense in countries you describe.

In your case, I'd suggest trying models with simple steps, direct homepage downloads, and Windows compatibility (since you use Windows). Try LM Studio first. If that doesn't work, try jan following LiteLLM/ollama settings in the guide as indicated in this comment. If that still doesn't work, try other models or ask for more help.


Thanks for the fantastic response. It's very valuable to have an alternative way to leverage artificial intelligence. I'd like to try this at a suitable time.

Thank you for highlighting the free alternatives.
And from your experience, or if other members can provide insights, do you know of any other FOSS alternatives to OpenAI that are available online?
Additionally, what are the advantages and disadvantages of using an online alternative vs offline one, considering other factors than the response time?
I assume online services are more powerful compared to offline ones, aren't they? Or offline ones are enough for our typical or common requests?

jan works with Jarvis following the guide's LM Studio steps paired with your settings instructions (replacing localhost with (default) or

Also, noting that a dummy OpenAI API key is needed for offline chat models would improve the guide.