Instructions to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("continuum-ai/mixtral-8x7b-instruct-compacted-conservative")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

llama-cpp-python

How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="continuum-ai/mixtral-8x7b-instruct-compacted-conservative",
	filename="mixtral-8x7b-compacted-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M

Use Docker

docker model run hf.co/continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M

LM Studio
Jan

vLLM

How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "continuum-ai/mixtral-8x7b-instruct-compacted-conservative"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "continuum-ai/mixtral-8x7b-instruct-compacted-conservative",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M

Ollama
How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with Ollama:
```
ollama run hf.co/continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
```

Unsloth Studio new

How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for continuum-ai/mixtral-8x7b-instruct-compacted-conservative to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for continuum-ai/mixtral-8x7b-instruct-compacted-conservative to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for continuum-ai/mixtral-8x7b-instruct-compacted-conservative to start chatting

MLX LM

How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "continuum-ai/mixtral-8x7b-instruct-compacted-conservative"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "continuum-ai/mixtral-8x7b-instruct-compacted-conservative"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "continuum-ai/mixtral-8x7b-instruct-compacted-conservative",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with Docker Model Runner:
```
docker model run hf.co/continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M
```

Lemonade

How to use continuum-ai/mixtral-8x7b-instruct-compacted-conservative with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull continuum-ai/mixtral-8x7b-instruct-compacted-conservative:Q4_K_M

Run and chat with the model

lemonade run user.mixtral-8x7b-instruct-compacted-conservative-Q4_K_M

List all available models

lemonade list

mixtral-8x7b-instruct-compacted-conservative / importance.activation_count.json

EnricoFermi

Upload importance.activation_count.json with huggingface_hub

50eaf40 verified about 2 months ago

raw

history blame contribute delete

4.33 kB

	{
	"model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
	"calibration_corpus": "/mnt/cold/factory-work/_seed_mixtral-8x7b-instruct-compacted-conservative/calibration/heldout_code300.jsonl",
	"calibration_examples": 300,
	"calibration_tokens": 148945,
	"num_hidden_layers": 32,
	"num_experts": 8,
	"num_experts_per_tok": 2,
	"activation_counts": {
	"0": [
	35102,
	33314,
	41694,
	40524,
	37312,
	49292,
	32904,
	27748
	],
	"1": [
	32431,
	33433,
	36391,
	49113,
	40711,
	35885,
	32683,
	37243
	],
	"2": [
	34726,
	36438,
	40336,
	30850,
	44129,
	41694,
	37704,
	32013
	],
	"3": [
	41900,
	38964,
	39785,
	30893,
	37526,
	38562,
	35186,
	35074
	],
	"4": [
	34326,
	38017,
	38548,
	36641,
	40000,
	36877,
	36084,
	37397
	],
	"5": [
	35135,
	35846,
	37737,
	36296,
	38141,
	38042,
	39207,
	37486
	],
	"6": [
	44852,
	41714,
	36003,
	25429,
	37976,
	29240,
	40461,
	42215
	],
	"7": [
	39616,
	47922,
	32744,
	40740,
	23401,
	29345,
	40047,
	44075
	],
	"8": [
	28854,
	38312,
	33408,
	28060,
	45525,
	33412,
	51836,
	38483
	],
	"9": [
	38312,
	35269,
	41791,
	41427,
	31378,
	36239,
	34143,
	39331
	],
	"10": [
	38362,
	47996,
	40021,
	37906,
	39920,
	33228,
	26067,
	34390
	],
	"11": [
	40330,
	42584,
	41410,
	12176,
	41097,
	39487,
	38742,
	42064
	],
	"12": [
	41322,
	34059,
	41594,
	40201,
	38466,
	41649,
	42746,
	17853
	],
	"13": [
	34150,
	38206,
	35849,
	35467,
	36705,
	39945,
	36630,
	40938
	],
	"14": [
	39626,
	37258,
	16359,
	45680,
	48583,
	35579,
	42372,
	32433
	],
	"15": [
	41804,
	30735,
	38027,
	21279,
	43544,
	39128,
	44441,
	38932
	],
	"16": [
	30301,
	38201,
	44200,
	29707,
	36656,
	36987,
	46110,
	35728
	],
	"17": [
	37427,
	40051,
	43999,
	27880,
	38465,
	44488,
	28699,
	36881
	],
	"18": [
	42908,
	35744,
	42602,
	39877,
	35799,
	28753,
	34876,
	37331
	],
	"19": [
	44817,
	27495,
	40313,
	31874,
	40092,
	36939,
	41014,
	35346
	],
	"20": [
	38256,
	33764,
	35261,
	34650,
	34012,
	43950,
	36662,
	41335
	],
	"21": [
	45071,
	47493,
	37885,
	30225,
	28826,
	31154,
	47964,
	29272
	],
	"22": [
	31681,
	50021,
	35349,
	31772,
	35300,
	31493,
	49964,
	32310
	],
	"23": [
	41712,
	33148,
	32231,
	39283,
	41130,
	35220,
	35453,
	39713
	],
	"24": [
	29448,
	37039,
	38994,
	44754,
	35321,
	36258,
	37896,
	38180
	],
	"25": [
	33738,
	34048,
	38945,
	37640,
	34604,
	40844,
	35963,
	42108
	],
	"26": [
	38504,
	36821,
	41817,
	39244,
	37611,
	30625,
	34534,
	38734
	],
	"27": [
	36510,
	33816,
	41895,
	37649,
	35480,
	48761,
	32096,
	31683
	],
	"28": [
	39385,
	31371,
	49298,
	35980,
	39038,
	30833,
	32289,
	39696
	],
	"29": [
	34617,
	37242,
	36439,
	48745,
	38976,
	36744,
	31844,
	33283
	],
	"30": [
	35870,
	40637,
	40443,
	34592,
	53189,
	22145,
	37817,
	33197
	],
	"31": [
	34590,
	34412,
	28823,
	53616,
	32245,
	38255,
	40079,
	35870
	]
	},
	"metric_version": "v1.activation_count",
	"tool": "expert_activation_profile.py"
	}