Thanks for the feedback @davadev! This is definitely a work in progress and any feedback helps!
Added notebook exclusion / inclusion to v0.4.2. Once you select a notebook and run "Exclude notebook from note DB", the notebook and all its sub-notebooks will be excluded starting from the next update of the DB.
Regarding the model, the default gpt-3.5-turbo should give a good trade-off.
For some use cases, I believe that just using simple semantic search (related notes) could be pretty effective, because it will try to refer you to a specific section in your notes where relevant information exists. For example: (1) Write your query; (2) Select the query text; (3) Hit "Find related notes". This is also a way of understanding what why the chat missed your point (what it does, essentially is to search for related notes based on your query and chat history). So you can check in advance which notes are going to come up with your query.

I also experienced that some queries worked better than others, and I believe that this is also affected by how my notes are structured and written depending on their subject.
There are a few directions to improve this:
- Improving the embeddings, that is the mathematical representation of the notes. The current model has its pros (offline, fast), but there are better (and usually much heavier) ones, and I plan to add support for additional models.
- Improving the processing of retrieved notes in order to generate a response.
- Improving the chunking of the notes so that their context is optimized.
- Very careful prompting.
(Working on it...)
Additional information on tokens: There are two settings for max tokens: (1) "Model: Max Tokens" which is recommended to be maximal (~4000); (2) "Chat: Memory Tokens" which can be large but should not exceed some value depending on the use case. I recommend setting it to 1000-1500 tops if you are getting these messages.
(Long explanation follows...)
Because in the chat GPT gets both the history of your conversation and excerpts from your notes, the following defines the limits on the tokens:
[Length of chat up to this point, or Memory Tokens (whichever is smaller)] + [Length of notes extracted from notebooks, or Memory Tokens] + [GPT's response] < [Max Tokens]
So, if the chat is already pretty long (>2000 tokens), and Memory Tokens was set to be very large (e.g., >2000 tokens), you are left with no space for GPT to work with. I'll try to think of ways to either inform the user about such situations or circumvent them. In any case, either keep your chats shorter (open new notes for new topics), or decrease the memory tokens to 1000-1500. Such things will become less of an issue as models with support for as much as 100K tokens already exist, and will probably become widespread sooner or later.