It’s good model, but it still requires 24gb vram.
I’m waiting until something like llama.cpp is made for this.
Not true. See — or actually nothing to be seen here, since “it just works”: https://github.com/ggerganov/llama.cpp/discussions/3368 and https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF
And here is someone describing how to do the quantization yourself: https://advanced-stack.com/resources/running-inference-using-mistral-ai-first-released-model-with-llama-cpp.html
Ooh, thanks. 🤗
AFAIK Mistral does already work in llama.cpp, or am I misunderstanding something? I’ve yet to try it.
That’s a great article on a good website : no paywall nor advertising.
What it says about this model is that it’s better than other comparable large language models and it is so because of a great group of searchers (once from Google and from Meta) working on this.
They say it is comparatively small at 7 billion parameters. Open source, free to download, free to use, free to tweak yourself.