I have tried to install Xinference, but unfortunately I can't get it to work. I managed to get rid of all the python errors, but still the process gets killed while building the wheel. A quick search on the internet suggests that hardware resources (RAM) are to blame. It seems to be much easier to run LLMs locally than to run embedding models locally.
For my private projects I stick to the English language as it is currently the most feasible way to run something locally with few resources. I stumbled across this LLM " phi 2 Q5_K_M" which seems to be the smallest english RAG capable model and am curious if anyone can evaluate how well it works for their usecase.
The default Jarvis offline embedding model has its limitations, and for my latest project I tried it didn't manage to get the right related notes, while OpenAi text-embedding-ada-002 with the same context for semantic search gets exactly the notes I need. On the plus side, I now understand the critical role of the embedding model. Maybe I can get the same result as with OpenAi ada by reformatting my notes or changing the context .
@shikuz I am thinking about how to benchmark the different offline configurations. Maybe we could add some sample notes to the repo that everyone could use to test different models (LLM/embeddings) so that we can either find out which models work best or what formatting one has to have so that even not-so-good embeddings models manage to find the relevant information. But I am currently only assuming that my problem lies in the note formatting, as to my knowledge it could also be an embeddings model lacking data in the domain of my use...