Instructions to use user-anto/Axiom-Dense-380M-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use user-anto/Axiom-Dense-380M-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="user-anto/Axiom-Dense-380M-Base", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("user-anto/Axiom-Dense-380M-Base", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use user-anto/Axiom-Dense-380M-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "user-anto/Axiom-Dense-380M-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "user-anto/Axiom-Dense-380M-Base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/user-anto/Axiom-Dense-380M-Base

SGLang

How to use user-anto/Axiom-Dense-380M-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "user-anto/Axiom-Dense-380M-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "user-anto/Axiom-Dense-380M-Base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "user-anto/Axiom-Dense-380M-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "user-anto/Axiom-Dense-380M-Base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use user-anto/Axiom-Dense-380M-Base with Docker Model Runner:
```
docker model run hf.co/user-anto/Axiom-Dense-380M-Base
```

Axiom-Dense-380M-Base

Axiom-Dense-380M-Base is a decoder-only causal language model trained from scratch for general-purpose next-token prediction on English web text. This is a base pretrained model, not an instruction-tuned chat model.

Model Summary

Model type: decoder-only Transformer (causal LM)
Parameter count: 385,849,344
Context length: 1,024 tokens
Vocabulary: 100,277 (tiktoken cl100k_base)
Training objective: autoregressive next-token prediction
Special handling: tied input/output embeddings (embed.weight tied to lm_head.weight)

Architecture

This model follows a dense Transformer stack with grouped-query attention and rotary positional embeddings.

Hidden size: 1024
Layers: 24
Attention heads: 16
KV heads: 8 (GQA)
FFN multiplier: 2.6667 (rounded to hardware-friendly multiple)
Normalization: RMSNorm
Positional encoding: RoPE (theta=10000)
Activation: SwiGLU
Dropout: 0.0

Implementation details are defined in:

model.py (core architecture and generation)
config.py (ModelConfig, TrainConfig)

Training Data

Source dataset: HuggingFaceFW/fineweb-edu, sample-10BT split
Local dataset path during training: data/fineweb-edu-10BT
Text field: text
Validation split strategy: deterministic hash split with val_fraction=0.001 and split_seed=1337
Document boundary treatment: EOS token appended after each document

Training Setup

Target tokens: 8,000,000,000
Effective tokens per optimizer step: 327,680 (batch_size=1, seq_len=1024, grad_accum=320)
Computed optimizer steps: 24,414
Planned tokens represented by training schedule: 7,999,979,520
Optimizer: AdamW8bit (fallback to AdamW if unavailable)
LR schedule: warmup, constant phase, cosine decay
Warmup steps: 2,000
LR max/min: 3e-4 / 1e-5
Weight decay: 0.1
Betas: (0.9, 0.95)
Gradient clipping: 1.0
Precision: bfloat16
Gradient checkpointing: enabled
Compile: disabled in provided config

Evaluation Snapshot

Validation metrics in this repo are tracked in eval.csv at interval checkpoints.

Best observed eval loss: 2.7394 at step 15,000
Best observed eval perplexity: 15.4780 at step 15,000
Final logged eval loss: 2.8972 at step 24,000
Final logged eval perplexity: 18.1233 at step 24,000

These are internal development metrics on the project validation split, not a broad benchmark suite.

Intended Use

Continued pretraining
Supervised finetuning or instruction tuning
Research and experimentation on medium-scale dense LMs
Educational use for studying custom Transformer implementations

Out-of-Scope / Not Recommended

Safety-critical or high-stakes decisions (medical, legal, financial)
Direct deployment as a reliable assistant without task-specific alignment and evaluation
Use cases requiring guaranteed factual accuracy

Limitations

Base model behavior: may produce repetitive, off-topic, or hallucinatory outputs
No instruction tuning by default
English-centric training distribution
Context window limited to 1,024 tokens
Bias/toxicity risks inherited from web-scale text data

Safety and Risk Notes

Potential harms include generation of incorrect, biased, or unsafe text. Downstream users should add:

Domain-specific evaluation
Prompt and output safety filtering
Human oversight for sensitive workflows
Red-teaming before production release

Tokenization

Tokenizer backend: tiktoken
Encoding: cl100k_base
Vocab size: 100,277
EOS token: tokenizer eot_token

Reproducibility

Core files relevant to reproducibility:

train.py (training loop, checkpointing, metrics)
data.py (dataset packing/streaming and deterministic split logic)
model.py (architecture)
config.py (model/training hyperparameters)

Seed configuration:

Python / NumPy / PyTorch seed: 1337

Usage

This repository contains custom model/tokenizer code paths. Load with the project code or with Hugging Face transformers remote code support if published with matching auto_map files.

Downloads last month: 974

Safetensors

Model size

0.5B params

Tensor type

F32

user-anto
/

Axiom-Dense-380M-Base