Instructions to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("continuum-ai/mixtral-8x7b-instruct-compacted-conservative") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - llama-cpp-python
How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="continuum-ai/mixtral-8x7b-instruct-compacted-conservative", filename="mixtral-8x7b-compacted-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M # Run inference directly in the terminal: llama-cli -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M # Run inference directly in the terminal: llama-cli -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
Use Docker
docker model run hf.co/continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "continuum-ai/mixtral-8x7b-instruct-compacted-conservative" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "continuum-ai/mixtral-8x7b-instruct-compacted-conservative", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
- Ollama
How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with Ollama:
ollama run hf.co/continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
- Unsloth Studio new
How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for continuum-ai/mixtral-8x7b-instruct-compacted-conservative to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for continuum-ai/mixtral-8x7b-instruct-compacted-conservative to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for continuum-ai/mixtral-8x7b-instruct-compacted-conservative to start chatting
- MLX LM
How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "continuum-ai/mixtral-8x7b-instruct-compacted-conservative"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "continuum-ai/mixtral-8x7b-instruct-compacted-conservative" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "continuum-ai/mixtral-8x7b-instruct-compacted-conservative", "messages": [ {"role": "user", "content": "Hello"} ] }' - Docker Model Runner
How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with Docker Model Runner:
docker model run hf.co/continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
- Lemonade
How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
Run and chat with the model
lemonade run user.mixtral-8x7b-instruct-compacted-conservative-Q4_K_M
List all available models
lemonade list
| { | |
| "model": "mistralai/Mixtral-8x7B-Instruct-v0.1", | |
| "calibration_corpus": "/mnt/cold/factory-work/_seed_mixtral-8x7b-instruct-compacted-conservative/calibration/heldout_code300.jsonl", | |
| "calibration_examples": 300, | |
| "calibration_tokens": 148945, | |
| "num_hidden_layers": 32, | |
| "num_experts": 8, | |
| "num_experts_per_tok": 2, | |
| "activation_counts": { | |
| "0": [ | |
| 35102, | |
| 33314, | |
| 41694, | |
| 40524, | |
| 37312, | |
| 49292, | |
| 32904, | |
| 27748 | |
| ], | |
| "1": [ | |
| 32431, | |
| 33433, | |
| 36391, | |
| 49113, | |
| 40711, | |
| 35885, | |
| 32683, | |
| 37243 | |
| ], | |
| "2": [ | |
| 34726, | |
| 36438, | |
| 40336, | |
| 30850, | |
| 44129, | |
| 41694, | |
| 37704, | |
| 32013 | |
| ], | |
| "3": [ | |
| 41900, | |
| 38964, | |
| 39785, | |
| 30893, | |
| 37526, | |
| 38562, | |
| 35186, | |
| 35074 | |
| ], | |
| "4": [ | |
| 34326, | |
| 38017, | |
| 38548, | |
| 36641, | |
| 40000, | |
| 36877, | |
| 36084, | |
| 37397 | |
| ], | |
| "5": [ | |
| 35135, | |
| 35846, | |
| 37737, | |
| 36296, | |
| 38141, | |
| 38042, | |
| 39207, | |
| 37486 | |
| ], | |
| "6": [ | |
| 44852, | |
| 41714, | |
| 36003, | |
| 25429, | |
| 37976, | |
| 29240, | |
| 40461, | |
| 42215 | |
| ], | |
| "7": [ | |
| 39616, | |
| 47922, | |
| 32744, | |
| 40740, | |
| 23401, | |
| 29345, | |
| 40047, | |
| 44075 | |
| ], | |
| "8": [ | |
| 28854, | |
| 38312, | |
| 33408, | |
| 28060, | |
| 45525, | |
| 33412, | |
| 51836, | |
| 38483 | |
| ], | |
| "9": [ | |
| 38312, | |
| 35269, | |
| 41791, | |
| 41427, | |
| 31378, | |
| 36239, | |
| 34143, | |
| 39331 | |
| ], | |
| "10": [ | |
| 38362, | |
| 47996, | |
| 40021, | |
| 37906, | |
| 39920, | |
| 33228, | |
| 26067, | |
| 34390 | |
| ], | |
| "11": [ | |
| 40330, | |
| 42584, | |
| 41410, | |
| 12176, | |
| 41097, | |
| 39487, | |
| 38742, | |
| 42064 | |
| ], | |
| "12": [ | |
| 41322, | |
| 34059, | |
| 41594, | |
| 40201, | |
| 38466, | |
| 41649, | |
| 42746, | |
| 17853 | |
| ], | |
| "13": [ | |
| 34150, | |
| 38206, | |
| 35849, | |
| 35467, | |
| 36705, | |
| 39945, | |
| 36630, | |
| 40938 | |
| ], | |
| "14": [ | |
| 39626, | |
| 37258, | |
| 16359, | |
| 45680, | |
| 48583, | |
| 35579, | |
| 42372, | |
| 32433 | |
| ], | |
| "15": [ | |
| 41804, | |
| 30735, | |
| 38027, | |
| 21279, | |
| 43544, | |
| 39128, | |
| 44441, | |
| 38932 | |
| ], | |
| "16": [ | |
| 30301, | |
| 38201, | |
| 44200, | |
| 29707, | |
| 36656, | |
| 36987, | |
| 46110, | |
| 35728 | |
| ], | |
| "17": [ | |
| 37427, | |
| 40051, | |
| 43999, | |
| 27880, | |
| 38465, | |
| 44488, | |
| 28699, | |
| 36881 | |
| ], | |
| "18": [ | |
| 42908, | |
| 35744, | |
| 42602, | |
| 39877, | |
| 35799, | |
| 28753, | |
| 34876, | |
| 37331 | |
| ], | |
| "19": [ | |
| 44817, | |
| 27495, | |
| 40313, | |
| 31874, | |
| 40092, | |
| 36939, | |
| 41014, | |
| 35346 | |
| ], | |
| "20": [ | |
| 38256, | |
| 33764, | |
| 35261, | |
| 34650, | |
| 34012, | |
| 43950, | |
| 36662, | |
| 41335 | |
| ], | |
| "21": [ | |
| 45071, | |
| 47493, | |
| 37885, | |
| 30225, | |
| 28826, | |
| 31154, | |
| 47964, | |
| 29272 | |
| ], | |
| "22": [ | |
| 31681, | |
| 50021, | |
| 35349, | |
| 31772, | |
| 35300, | |
| 31493, | |
| 49964, | |
| 32310 | |
| ], | |
| "23": [ | |
| 41712, | |
| 33148, | |
| 32231, | |
| 39283, | |
| 41130, | |
| 35220, | |
| 35453, | |
| 39713 | |
| ], | |
| "24": [ | |
| 29448, | |
| 37039, | |
| 38994, | |
| 44754, | |
| 35321, | |
| 36258, | |
| 37896, | |
| 38180 | |
| ], | |
| "25": [ | |
| 33738, | |
| 34048, | |
| 38945, | |
| 37640, | |
| 34604, | |
| 40844, | |
| 35963, | |
| 42108 | |
| ], | |
| "26": [ | |
| 38504, | |
| 36821, | |
| 41817, | |
| 39244, | |
| 37611, | |
| 30625, | |
| 34534, | |
| 38734 | |
| ], | |
| "27": [ | |
| 36510, | |
| 33816, | |
| 41895, | |
| 37649, | |
| 35480, | |
| 48761, | |
| 32096, | |
| 31683 | |
| ], | |
| "28": [ | |
| 39385, | |
| 31371, | |
| 49298, | |
| 35980, | |
| 39038, | |
| 30833, | |
| 32289, | |
| 39696 | |
| ], | |
| "29": [ | |
| 34617, | |
| 37242, | |
| 36439, | |
| 48745, | |
| 38976, | |
| 36744, | |
| 31844, | |
| 33283 | |
| ], | |
| "30": [ | |
| 35870, | |
| 40637, | |
| 40443, | |
| 34592, | |
| 53189, | |
| 22145, | |
| 37817, | |
| 33197 | |
| ], | |
| "31": [ | |
| 34590, | |
| 34412, | |
| 28823, | |
| 53616, | |
| 32245, | |
| 38255, | |
| 40079, | |
| 35870 | |
| ] | |
| }, | |
| "metric_version": "v1.activation_count", | |
| "tool": "expert_activation_profile.py" | |
| } |