Instructions to use user-anto/Axiom-Dense-380M-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use user-anto/Axiom-Dense-380M-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="user-anto/Axiom-Dense-380M-Base", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("user-anto/Axiom-Dense-380M-Base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use user-anto/Axiom-Dense-380M-Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "user-anto/Axiom-Dense-380M-Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "user-anto/Axiom-Dense-380M-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/user-anto/Axiom-Dense-380M-Base
- SGLang
How to use user-anto/Axiom-Dense-380M-Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "user-anto/Axiom-Dense-380M-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "user-anto/Axiom-Dense-380M-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "user-anto/Axiom-Dense-380M-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "user-anto/Axiom-Dense-380M-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use user-anto/Axiom-Dense-380M-Base with Docker Model Runner:
docker model run hf.co/user-anto/Axiom-Dense-380M-Base
Axiom-Dense-380M-Base
Axiom-Dense-380M-Base is a decoder-only causal language model trained from scratch for general-purpose next-token prediction on English web text. This is a base pretrained model, not an instruction-tuned chat model.
Model Summary
- Model type: decoder-only Transformer (causal LM)
- Parameter count: 385,849,344
- Context length: 1,024 tokens
- Vocabulary: 100,277 (
tiktokencl100k_base) - Training objective: autoregressive next-token prediction
- Special handling: tied input/output embeddings (
embed.weighttied tolm_head.weight)
Architecture
This model follows a dense Transformer stack with grouped-query attention and rotary positional embeddings.
- Hidden size: 1024
- Layers: 24
- Attention heads: 16
- KV heads: 8 (GQA)
- FFN multiplier: 2.6667 (rounded to hardware-friendly multiple)
- Normalization: RMSNorm
- Positional encoding: RoPE (
theta=10000) - Activation: SwiGLU
- Dropout: 0.0
Implementation details are defined in:
model.py(core architecture and generation)config.py(ModelConfig,TrainConfig)
Training Data
- Source dataset:
HuggingFaceFW/fineweb-edu,sample-10BTsplit - Local dataset path during training:
data/fineweb-edu-10BT - Text field:
text - Validation split strategy: deterministic hash split with
val_fraction=0.001andsplit_seed=1337 - Document boundary treatment: EOS token appended after each document
Training Setup
- Target tokens: 8,000,000,000
- Effective tokens per optimizer step: 327,680 (
batch_size=1,seq_len=1024,grad_accum=320) - Computed optimizer steps: 24,414
- Planned tokens represented by training schedule: 7,999,979,520
- Optimizer: AdamW8bit (fallback to AdamW if unavailable)
- LR schedule: warmup, constant phase, cosine decay
- Warmup steps: 2,000
- LR max/min: 3e-4 / 1e-5
- Weight decay: 0.1
- Betas: (0.9, 0.95)
- Gradient clipping: 1.0
- Precision: bfloat16
- Gradient checkpointing: enabled
- Compile: disabled in provided config
Evaluation Snapshot
Validation metrics in this repo are tracked in eval.csv at interval checkpoints.
- Best observed eval loss: 2.7394 at step 15,000
- Best observed eval perplexity: 15.4780 at step 15,000
- Final logged eval loss: 2.8972 at step 24,000
- Final logged eval perplexity: 18.1233 at step 24,000
These are internal development metrics on the project validation split, not a broad benchmark suite.
Intended Use
- Continued pretraining
- Supervised finetuning or instruction tuning
- Research and experimentation on medium-scale dense LMs
- Educational use for studying custom Transformer implementations
Out-of-Scope / Not Recommended
- Safety-critical or high-stakes decisions (medical, legal, financial)
- Direct deployment as a reliable assistant without task-specific alignment and evaluation
- Use cases requiring guaranteed factual accuracy
Limitations
- Base model behavior: may produce repetitive, off-topic, or hallucinatory outputs
- No instruction tuning by default
- English-centric training distribution
- Context window limited to 1,024 tokens
- Bias/toxicity risks inherited from web-scale text data
Safety and Risk Notes
Potential harms include generation of incorrect, biased, or unsafe text. Downstream users should add:
- Domain-specific evaluation
- Prompt and output safety filtering
- Human oversight for sensitive workflows
- Red-teaming before production release
Tokenization
- Tokenizer backend:
tiktoken - Encoding:
cl100k_base - Vocab size: 100,277
- EOS token: tokenizer
eot_token
Reproducibility
Core files relevant to reproducibility:
train.py(training loop, checkpointing, metrics)data.py(dataset packing/streaming and deterministic split logic)model.py(architecture)config.py(model/training hyperparameters)
Seed configuration:
- Python / NumPy / PyTorch seed: 1337
Usage
This repository contains custom model/tokenizer code paths. Load with the project code or with Hugging Face transformers remote code support if published with matching auto_map files.
- Downloads last month
- 974