Instructions to use OddTheGreat/Circuitry_24B_V.3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OddTheGreat/Circuitry_24B_V.3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="OddTheGreat/Circuitry_24B_V.3")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("OddTheGreat/Circuitry_24B_V.3")
model = AutoModelForCausalLM.from_pretrained("OddTheGreat/Circuitry_24B_V.3")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use OddTheGreat/Circuitry_24B_V.3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OddTheGreat/Circuitry_24B_V.3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OddTheGreat/Circuitry_24B_V.3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/OddTheGreat/Circuitry_24B_V.3

SGLang

How to use OddTheGreat/Circuitry_24B_V.3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OddTheGreat/Circuitry_24B_V.3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OddTheGreat/Circuitry_24B_V.3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OddTheGreat/Circuitry_24B_V.3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OddTheGreat/Circuitry_24B_V.3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use OddTheGreat/Circuitry_24B_V.3 with Docker Model Runner:
```
docker model run hf.co/OddTheGreat/Circuitry_24B_V.3
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Circuitry_24B_V.3

This is a merge of pre-trained language models.

Lately i was experimenting with models, trying to fight massive problem of Circuitry v.2 - self censorship. Maybe my sysprompt is not good enough, maybe Q4_k_s is too small, i don't know. Idea of small update to v2 was born.

However, I messed up with config and made this. And somehow it performs better than circuitry v2.

But to the model itself.

Model works great in rp, erp and assistant. It produces better dialogues than v.2, with tendency to longer messages and narration.

Model follows instructions, consistent in markdown style, with enough attention to details. It can "remember" something what lies at beginning of 12k context and even isn't too broken with q8 context quantization.

Cliches are present - shivers running, heads spinning, though banned strings in ST fixes that.

Writing style is nice, without scenario bias. It can operate in a grimdark setting as good as in a utopian paradise card.

Model performs better on good cards, with dialogue and style example, but also can work with half written garbage.

It easily handles two characters in scene, and remains stable up to five, but replies length will inflate dramatically.

At censorship front, there is some improvements. V.2 did not throws refusals, yes, but it avoided explicit language until prompted directly. This one not shies in swearing and nsfw, but remains adequate.

Ru is tested on assistant, it was good. Ru rp was not tested.

Tested on MistralV7-tekken T0.8 (1.01 sometimes) XTC 0.1 0.1