This could be an option. I just checked and there are some new text feature extraction models on huggingface in gguf. Practically they can be run in any of these LM Studio, Ollama, Jan. Optimal would be to have the LLM and Embedding model run with the same tool to save resources. I am only aware of ollama being able to run two models at once, but the question is whether this works with the provided workaround with Litellm.