This is indeed an intriguing idea. I think this could solve some of the problems I face when trying to build my support chatbot! I need the chatbot to have enough contextual information (for example, what troubleshooting steps have already been taken, or what device the user is talking about...).

Adding contextual information to the query could perhaps eliminate the need to include chat history, as the context would already be included in the user's query. In this way, by treating each question as a new chat query, we could save some valuable tokens for more note chunks! (Or make using offline models with smaller context window more useful!)