π§ microclaw-for-openclaw β Fallback Agent for OpenClaw (v2026.2.17)
Model ID: webxos/microclaw-for-openclaw-version-2026.2.17
Tags: openclaw, fallback-agent, grpo, vae, kv-cache, dpo, tool-masking, uncertainty, rag, semantic-cache, soul.md, huggingface-space, gguf, llm-distillation
π Overview
microclaw (v2026.2.17) is a lightweight, distilled language model designed as a fallback agent for the OpenClaw ecosystem. When the primary agent loses connectivity or requires offline operation, microclaw steps in to handle essential system tasks: file management, status checks, cron jobs, and simple Q&A.
WARNING: You will need to train your own GGUF model locally, the microcaw.gguf presented in this repo is a lightweight placeholder so users can scale and build their own local models with llama.cpp.
You will need to configure your own build locally from scratch with this model, it is still being developed and is under testing. This version is made to integrate directly with Openclaw.ai 18789 port and in this README.md we will present multiple ways and optional ways to configure this agent on your local Linux Debian based machines.
This version introduces advanced training and inference enhancements:
- Toolβuse masking and schemaβfirst training for reliable function calling.
- Direct Preference Optimization (DPO) to align outputs with human preferences.
- Uncertainty estimation with configurable thresholds for safe escalation.
- RetrievalβAugmented Generation (RAG) with semantic chunking.
- Semantic KVβcache for highβsimilarity query reuse.
- Quantization (down to 2βbit) and pruning for extreme memory efficiency.
The repository contains the full and partially trained model files, configuration (soul.md, AGENTS.md, HEARTBEAT.md, SECURITY.md), and export bundles ready for deployment to Hugging Face Spaces or local execution with OpenClaw.
β¨ Key Features
- GRPO (Group Relative Policy Optimization) β Trains the agent with groupβwise advantage estimation for stable policy updates.
- VAE Filter β A Variational Autoencoder that filters lowβquality training samples, improving output coherence.
- ToolβUse Masking β Masks nonβtool tokens during training to enforce strict schema adherence (JSON/YAML).
- DPO (Direct Preference Optimization) β Fineβtunes on preference pairs to reduce hallucinations and improve helpfulness.
- Uncertainty Estimation β Monitors tokenβlevel entropy and escalates to safe responses when confidence drops below a threshold.
- RAG (RetrievalβAugmented Generation) β Retrieves relevant chunks from a local knowledge base (FAISS) to ground responses.
- Semantic Cache β Reuses previous generations for semantically similar queries, reducing latency and cost.
- Quantization & Pruning β Compress the model to 2β8 bits and prune unimportant weights; backend support for AutoGPTQ, llama.cpp (GGUF), and bitsandbytes.
- KVβCache β Intelligent reuse of key/value states reduces inference latency by up to 78% (measured on local benchmarks).
- Soul.md Configuration β Define personality, subβagent rules, proactive tasks, and prompt injection defenses in plain Markdown.
- Export Ready β Oneβclick export to a Hugging Face Space (Dockerβbased) or a portable ZIP archive.
- Quantized (4βbit GGUF) β Optimized for memoryβconstrained environments; runs smoothly on CPU.
Part 1: Installation
Included are multiple guides and ways you can implement Microclaw into your custom build, with steps to further train the GGUF file locally:
Read all steps carefully and find the right guide for your use case/setup, Not all options may work on your system. These guides are designed for specific use on Linux Debian systems.
1.1 Installation Guide + System Update & Basic Tools
sudo apt update
sudo apt upgrade -y
sudo apt install -y curl wget git build-essential
1.2 Install Docker (for containerized execution)
# Add Docker's official GPG key and repository
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian bullseye stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Install Docker Engine
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io
# Add your user to the docker group (avoid sudo for every command)
sudo usermod -aG docker $USER
newgrp docker # activate group changes in current shell
1.3 Install Node.js (v22 or later) & TypeScript
# Using NodeSource repository for a modern Node.js version
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt install -y nodejs
# Install TypeScript globally
sudo npm install -g typescript
# Verify
node --version # should be v22.x or higher
tsc --version
1.4 Install SQLite (for memory & logs)
sudo apt install -y sqlite3 libsqlite3-dev
Part 2: Microclaw Fallback Agent
The Microclaw agent is a Pythonβbased service (Flask + Transformers) that communicates with OpenClaw. You can install it using either a Python virtual environment (lightweight) or Conda (more reliable for PyTorch). Choose one method below.
2.1 Clone the Microclaw Repository
Create a parent directory for all agents:
sudo mkdir -p /opt/openclaw-agents
sudo chown -R $USER:$USER /opt/openclaw-agents
cd /opt/openclaw-agents
# Clone the Hugging Face repo (includes model files and soul configuration)
git lfs install
git clone https://huggingface.co/webxos/microclaw-for-openclaw-version-2026.2.17 microclaw-fallback
cd microclaw-fallback
Note: The .gguf model files are several hundred MB. If the download is interrupted, git lfs can resume. After cloning, verify the file sizes:
ls -lh *.gguf
They should be >100 MB, not 28 bytes. If they are still placeholders, run git lfs pull manually.
2.2 Option A: Install with Python Virtual Environment (venv)
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
# Upgrade pip and install dependencies
pip install --upgrade pip
pip install -r requirements.txt
If requirements.txt is missing, install core packages manually
pip install flask transformers torch sentence-transformers faiss-cpu --extra-index-url https://download.pytorch.org/whl/cpu
2.3 Option B: Install with Conda (Recommended for unstable networks)
# Download and install Miniconda (if not already present)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
source ~/miniconda3/bin/activate
# Create a dedicated environment with Python 3.11
conda create -y -n microclaw python=3.11
conda activate microclaw
# Install CPUβonly PyTorch from conda-forge (smaller, more reliable)
conda install -y pytorch torchvision torchaudio cpuonly -c pytorch
# Install the rest via pip
pip install flask transformers sentence-transformers faiss-cpu
2.4 Test the Agent Manually
# Make sure you are in the agent directory with the environment activated
python main.py
You should see output like * Running on http://127.0.0.1:18789. Press Ctrl+C to stop it.
βοΈ Part 3: Configure OpenClaw to Use the Microclaw Fallback
OpenClaw reads its configuration from a TOML file (typically ~/.config/openclaw/config.toml or /etc/openclaw/config.toml). You need to point it to your local Microclaw instance.
Find the port Microclaw listens on (default is 18789, defined in main.py):
grep port main.py
Edit the OpenClaw configuration (create it if it doesn't exist):
mkdir -p ~/.config/openclaw
nano ~/.config/openclaw/config.toml
Add or modify the [agent.fallback] section:
toml
[agent.fallback]
path = "/opt/openclaw-agents/microclaw-fallback"
port = 18789
enabled = true
If OpenClaw is already installed, restart it. (If you haven't installed OpenClaw yet, see Part 4 below.)
π³ Part 4: Install & Run OpenClaw (the main framework)
The OpenClaw core is a Node.js/TypeScript application. You can run it directly from source or use the provided Docker image.
4.1 Run OpenClaw via Docker (easiest)
Pull the official OpenClaw image (adjust tag as needed)
docker pull openclaw/openclaw:latest
Run the container, mounting the config and agents directories
docker run -d \
--name openclaw \
-p 3000:3000 \
-v ~/.config/openclaw:/home/node/.config/openclaw \
-v /opt/openclaw-agents:/opt/openclaw-agents \
openclaw/openclaw:latest
4.2 Run OpenClaw from Source (for development)
Clone the OpenClaw repository
git clone https://github.com/openclaw/core.git openclaw-core
cd openclaw-core
Install dependencies
yarn install
Build TypeScript
yarn build
Start OpenClaw (it will read the config from ~/.config/openclaw/config.toml)
yarn start
π§ͺ Part 5: Verify the Integration
Check that Microclaw is running (either manually or via systemd):
curl http://localhost:18789/health
π Guide to Microclaw Auto-Start (systemd)
To ensure the fallback agent starts on boot and restarts if it crashes, create a systemd service.
Create the service file:
sudo nano /etc/systemd/system/microclaw-fallback.service
Paste (adjust User and paths to match your setup):
ini
[Unit]
Description=Microclaw Fallback Agent for OpenClaw
After=network.target
[Service]
Type=simple
User=kali
WorkingDirectory=/opt/openclaw-agents/microclaw-fallback
Environment="PATH=/opt/openclaw-agents/microclaw-fallback/venv/bin"
ExecStart=/opt/openclaw-agents/microclaw-fallback/venv/bin/python /opt/openclaw-agents/microclaw-fallback/main.py
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl daemon-reload
sudo systemctl enable microclaw-fallback.service
sudo systemctl start microclaw-fallback.service
Check status:
sudo systemctl status microclaw-fallback.service
ALTERNATIVE GUIDE - Installing via Llama.cpp instead:
π¦ Prerequisites: Essential System Tools
You need a few standard command-line tools. Open a terminal and run: bash
Update your package list and install curl, wget, git, and build tools
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl wget git build-essential
π₯ Step 1: Download the Model with Git LFS
The model files are hosted in a Git repository and require Git Large File Storage (LFS) to download the actual GGUF files.
1.1: Install Git LFS
sudo apt install -y git-lfs
git lfs install
1.2: Create a directory for your models and clone the repository
mkdir -p ~/models
cd ~/models
git clone https://huggingface.co/webxos/microclaw-for-openclaw-version-2026.2.17 microclaw-fallback
cd microclaw-fallback
1.3: Ensure the GGUF files are fully downloaded
git lfs pull
Verification: After cloning, check that the .gguf files are present and are a reasonable size (several hundred MB, not 28 bytes). Run:
bash
ls -lh *.gguf
If the files are small placeholders, run git lfs pull again.
βοΈ Step 2: Set Up the llama.cpp Server
Now, download, compile, and set up llama.cpp with its built-in server. bash
2.1: Clone the llama.cpp repository
cd ~/models
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
2.2: Compile llama.cpp (this may take a few minutes)
make -j4
3. (Optional but recommended) Install the Python dependencies for the server
This step requires Python/pip, but it's a one-time, isolated setup.
sudo apt install -y python3-pip python3-venv
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
π Step 3.1: Run the Model Server
Now, start the server, pointing it to the GGUF model file you downloaded. Make sure you are in the llama.cpp directory with the virtual env activated
cd ~/models/llama.cpp
source venv/bin/activate
Find the exact GGUF filename (replace with the actual filename you have)
MODEL_FILE=~/models/microclaw-fallback/microclaw-for-openclaw-version-2026.2.17.Q4_K_M.gguf
Run the server
./server -m $MODEL_FILE \
--host 0.0.0.0 \
--port 8000 \
-c 2048 \
-ngl 0 # Use -ngl 33 if you have an NVIDIA GPU and compiled with CUDA support
Explanation of flags:
-m $MODEL_FILE : Path to your GGUF model.
--host 0.0.0.0 : Listen on all network interfaces (so OpenClaw can connect).
--port 8000 : The port the server will use.
-c 2048 : Context size (adjust based on model requirements).
-ngl 0 : Number of layers to offload to GPU. Use -ngl 33 (or more) if you have an NVIDIA GPU and compiled with CUDA.
Keep this terminal window open. The server is now running and ready to accept requests.
β Step 4: Test the Server
Open a new terminal and test the API to ensure it's working correctly.
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is the capital of France?",
"max_tokens": 50,
"temperature": 0.7
}'
You should receive a JSON response containing the model's generated text.
π Step 5: Configure OpenClaw to Use the Local Server
Now, configure OpenClaw to use this local server as its fallback agent.
Locate OpenClaw's configuration file. This is often ~/.config/openclaw/config.toml, /etc/openclaw/config.toml, or a .env file in the OpenClaw directory.
Edit the configuration to define a custom provider that points to your local server. The exact variable names depend on your OpenClaw version, but it generally looks something like this:
[agent.fallback]
provider = "custom" # or "openai-compatible"
base_url = "http://localhost:8000/v1"
api_key = "not-needed" # llama.cpp server doesn't require a key
model = "microclaw" # Optional: model name
enabled = true
If OpenClaw uses environment variables (e.g., in a .env file), you might set:
OPENCLAW_FALLBACK_PROVIDER=custom
OPENCLAW_CUSTOM_BASE_URL=http://localhost:8000/v1
OPENCLAW_CUSTOM_API_KEY=not-needed
Restart OpenClaw for the changes to take effect.
π How to Run the Server as a Background Service:
To have the server start automatically on boot and restart if it crashes, you can create a systemd service.
Create the service file:
sudo nano /etc/systemd/system/microclaw-llama.service
Paste the following (adjust User, WorkingDirectory, and ExecStart paths as needed):
ini
[Unit]
Description=llama.cpp server for Microclaw
After=network.target
[Service]
Type=simple
User=kali
WorkingDirectory=/home/kali/models/llama.cpp
ExecStart=/home/kali/models/llama.cpp/server -m /home/kali/models/microclaw-fallback/microclaw-for-openclaw-version-2026.2.17.Q4_K_M.gguf --host 0.0.0.0 --port 8000 -c 2048 -ngl 0
Restart=on-failure
RestartSec=10
[Install]
WantedBy=multi-user.target
Then enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable microclaw-llama.service
sudo systemctl start microclaw-llama.service
sudo systemctl status microclaw-llama.service # Check if it's running
ADVANCED GUIDE: TRAINING MICROCLAW.GGUF MODEL LOCALLY
This guide adapts the full microclaw pipeline to run entirely on a lowβend machine like an 8GB RAM laptop or even a Raspberry Pi 5. We'll use a tiny base model (0.5Bβ1B parameters), parameterβefficient fineβtuning (LoRA) on CPU, and extreme quantization (2βbit) to produce a GGUF file that runs smoothly on consumer hardware.
The final system provides:
- A local training script that fits in 8GB RAM (CPU only).
- A FastAPI server (
server.py) serving a retro MSβDOSβstyle CLI dashboard onlocalhost:8080. - Local API endpoints for inference, file management, cron jobs, and RAG.
- SQLite as a local database (conversation history, cache, RAG index).
- Integration with llama.cpp for efficient GGUF inference.
Prerequisites
- Hardware: x86_64 or ARM64 (Raspberry Pi 5) with at least 8GB RAM.
- OS: Debian 12 / Kali Linux / Raspberry Pi OS (64βbit).
- Storage: 10GB free space.
- Software: Python 3.10+, Git, CMake, build tools.
Step 1: Environment Setup
cd /home/kali/microclaw
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
requirements.txt (CPUβoptimized, no CUDA dependencies):
torch==2.2.0 --index-url https://download.pytorch.org/whl/cpu
transformers>=4.38.0
accelerate
datasets
trl>=0.8.0
peft
bitsandbytes
scipy
sentencepiece
protobuf
fastapi
uvicorn
sqlite-utils
pydantic
pyyaml
jinja2
aiofiles
llama-cpp-python
Step 2: Build llama.cpp (for conversion & inference)
llama.cpp provides the tools to convert Hugging Face models to GGUF and run them efficiently on CPU.
cd /home/kali
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build && cd build
cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS # optional: enables BLAS for speed
make -j$(nproc)
After compilation, the convert-hf-to-gguf.py script will be in llama.cpp/ (not in build). We'll use it later.
Step 3: Prepare the Dataset
You need a small dataset (a few hundred to a few thousand examples) for fineβtuning and DPO. Place JSONL files in data/raw/.
3.1 Toolβuse data (schemaβfirst)
Each line:
{
"instruction": "List files in /home",
"tools": ["ls"],
"response": "ls /home"
}
3.2 Preference data (for DPO)
Each line:
{
"prompt": "What is the weather?",
"chosen": "I cannot check live weather, but you can use the 'weather' tool.",
"rejected": "I don't know."
}
If you don't have preference data, you can skip DPO by setting dpo: false in config.
3.3 RAG documents (optional)
Place plain text files (.txt) in data/rag_docs/. The training script will chunk them and store embeddings in SQLite.
Step 4: Configuration (config.yaml)
Edit this file to match your paths and training preferences.
# config.yaml
model:
base_model_name: "TinyLlama/TinyLlama-1.1B-Chat-v1.0" # or "Qwen/Qwen2.5-0.5B"
cache_dir: "models/base"
training:
output_dir: "models/lora"
per_device_train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 2e-4
num_train_epochs: 3
max_seq_length: 512
use_lora: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
dpo: true
dpo_beta: 0.1
# CPU optimizations
dataloader_num_workers: 0
save_steps: 100
logging_steps: 10
data:
train_file: "data/raw/train.jsonl"
eval_file: "data/raw/eval.jsonl" # optional
preference_file: "data/raw/preferences.jsonl" # for DPO
rag:
enabled: true
chunk_size: 500
chunk_overlap: 50
embedding_model: "all-MiniLM-L6-v2" # tiny, runs on CPU
db_path: "db/microclaw.db"
server:
host: "0.0.0.0"
port: 8080
model_path: "models/microclaw.gguf"
context_size: 2048
max_tokens: 512
temperature: 0.7
Step 5: Training Script (train.py)
This script performs supervised fineβtuning (SFT) on instruction data, optionally followed by DPO, and finally merges the LoRA weights and saves the full model. It is heavily optimized for low RAM (CPU) usage.
UltraβLightweight Local Training & Deployment Guide
Optimized for CPUβonly systems (8GB RAM, no GPU) β Raspberry Pi ready
This guide adapts the full microclaw pipeline to run entirely on a lowβend machine like an 8GB RAM laptop or even a Raspberry Pi 5. We'll use a tiny base model (0.5Bβ1B parameters), parameterβefficient fineβtuning (LoRA) on CPU, and extreme quantization (2βbit) to produce a GGUF file that runs smoothly on consumer hardware.
The final system provides:
- A local training script that fits in 8GB RAM (CPU only).
- A FastAPI server (
server.py) serving a retro MSβDOSβstyle CLI dashboard onlocalhost:8080. - Local API endpoints for inference, file management, cron jobs, and RAG.
- SQLite as a local database (conversation history, cache, RAG index).
- Integration with llama.cpp for efficient GGUF inference.
Folder Structure (to be created)
/home/kali/microclaw/
βββ server.py # FastAPI server (inference + static files + API)
βββ train.py # CPUβoptimized fineβtuning + DPO script
βββ requirements.txt
βββ config.yaml
βββ data/
β βββ raw/ # Place your JSONL datasets here
β βββ rag_docs/ # Text files for RAG (optional)
βββ models/
β βββ base/ # Will contain the downloaded base model
β βββ lora/ # LoRA adapters after training
β βββ microclaw.gguf # Final quantized model (after conversion)
βββ static/
β βββ index.html # Main dashboard (CLI style)
β βββ style.css
β βββ script.js
β βββ pages/ # Additional pages (file manager, cron, etc.)
β βββ files.html
β βββ cron.html
β βββ rag.html
βββ db/
β βββ microclaw.db # SQLite database (autoβcreated)
βββ logs/
βββ training.log
Prerequisites
- Hardware: x86_64 or ARM64 (Raspberry Pi 5) with at least 8GB RAM.
- OS: Debian 12 / Kali Linux / Raspberry Pi OS (64βbit).
- Storage: 10GB free space.
- Software: Python 3.10+, Git, CMake, build tools.
Step 1: Environment Setup
cd /home/kali/microclaw
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
requirements.txt (CPUβoptimized, no CUDA dependencies):
torch==2.2.0 --index-url https://download.pytorch.org/whl/cpu
transformers>=4.38.0
accelerate
datasets
trl>=0.8.0
peft
bitsandbytes
scipy
sentencepiece
protobuf
fastapi
uvicorn
sqlite-utils
pydantic
pyyaml
jinja2
aiofiles
llama-cpp-python
Step 2: Build llama.cpp (for conversion & inference)
llama.cpp provides the tools to convert Hugging Face models to GGUF and run them efficiently on CPU.
cd /home/kali
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build && cd build
cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS # optional: enables BLAS for speed
make -j$(nproc)
After compilation, the convert-hf-to-gguf.py script will be in llama.cpp/ (not in build). We'll use it later.
Step 3: Prepare the Dataset
You need a small dataset (a few hundred to a few thousand examples) for fineβtuning and DPO. Place JSONL files in data/raw/.
3.1 Toolβuse data (schemaβfirst)
Each line:
{
"instruction": "List files in /home",
"tools": ["ls"],
"response": "ls /home"
}
3.2 Preference data (for DPO)
Each line:
{
"prompt": "What is the weather?",
"chosen": "I cannot check live weather, but you can use the 'weather' tool.",
"rejected": "I don't know."
}
If you don't have preference data, you can skip DPO by setting dpo: false in config.
3.3 RAG documents (optional)
Place plain text files (.txt) in data/rag_docs/. The training script will chunk them and store embeddings in SQLite.
Step 4: Configuration (config.yaml)
Edit this file to match your paths and training preferences.
# config.yaml
model:
base_model_name: "TinyLlama/TinyLlama-1.1B-Chat-v1.0" # or "Qwen/Qwen2.5-0.5B"
cache_dir: "models/base"
training:
output_dir: "models/lora"
per_device_train_batch_size: 1
gradient_accumulation_steps: 4
learning_rate: 2e-4
num_train_epochs: 3
max_seq_length: 512
use_lora: true
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
dpo: true
dpo_beta: 0.1
# CPU optimizations
dataloader_num_workers: 0
save_steps: 100
logging_steps: 10
data:
train_file: "data/raw/train.jsonl"
eval_file: "data/raw/eval.jsonl" # optional
preference_file: "data/raw/preferences.jsonl" # for DPO
rag:
enabled: true
chunk_size: 500
chunk_overlap: 50
embedding_model: "all-MiniLM-L6-v2" # tiny, runs on CPU
db_path: "db/microclaw.db"
server:
host: "0.0.0.0"
port: 8080
model_path: "models/microclaw.gguf"
context_size: 2048
max_tokens: 512
temperature: 0.7
Step 5: Training Script (train.py)
This script performs supervised fineβtuning (SFT) on instruction data, optionally followed by DPO, and finally merges the LoRA weights and saves the full model. It is heavily optimized for low RAM (CPU) usage.
#!/usr/bin/env python3
# train.py β CPUβonly fineβtuning with LoRA + optional DPO
import os
import yaml
import torch
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
TrainingArguments,
Trainer,
BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType
from datasets import load_dataset
from trl import DPOTrainer
import logging
# Load config
with open("config.yaml") as f:
config = yaml.safe_load(f)
# Setup logging
logging.basicConfig(level=logging.INFO, filename="logs/training.log", filemode="w")
logger = logging.getLogger(__name__)
def main():
# 1. Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config["model"]["base_model_name"], cache_dir=config["model"]["cache_dir"])
tokenizer.pad_token = tokenizer.eos_token
# 2. Load base model in 8-bit (CPU offload not supported for bitsandbytes on CPU; we use standard dtype)
# For CPU, we load in float32 and rely on LoRA to reduce memory.
model = AutoModelForCausalLM.from_pretrained(
config["model"]["base_model_name"],
cache_dir=config["model"]["cache_dir"],
torch_dtype=torch.float32, # CPU uses float32
low_cpu_mem_usage=True
)
# 3. Prepare LoRA
if config["training"]["use_lora"]:
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=config["training"]["lora_r"],
lora_alpha=config["training"]["lora_alpha"],
lora_dropout=config["training"]["lora_dropout"],
target_modules=["q_proj", "v_proj"] # adjust for your model
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# 4. Load dataset
dataset = load_dataset("json", data_files=config["data"]["train_file"], split="train")
if config["data"].get("eval_file"):
eval_dataset = load_dataset("json", data_files=config["data"]["eval_file"], split="train")
else:
eval_dataset = None
# Format prompt: "### Instruction:\n{instruction}\n\n### Response:\n{response}"
def format_func(example):
text = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['response']}{tokenizer.eos_token}"
return {"text": text}
dataset = dataset.map(format_func)
if eval_dataset:
eval_dataset = eval_dataset.map(format_func)
# Tokenize
def tokenize(element):
return tokenizer(element["text"], truncation=True, max_length=config["training"]["max_seq_length"], padding=False)
dataset = dataset.map(tokenize, remove_columns=dataset.column_names)
if eval_dataset:
eval_dataset = eval_dataset.map(tokenize, remove_columns=eval_dataset.column_names)
# 5. Training arguments (CPUβfriendly)
training_args = TrainingArguments(
output_dir=config["training"]["output_dir"],
per_device_train_batch_size=config["training"]["per_device_train_batch_size"],
gradient_accumulation_steps=config["training"]["gradient_accumulation_steps"],
learning_rate=config["training"]["learning_rate"],
num_train_epochs=config["training"]["num_train_epochs"],
logging_steps=config["training"]["logging_steps"],
save_steps=config["training"]["save_steps"],
evaluation_strategy="steps" if eval_dataset else "no",
eval_steps=config["training"]["save_steps"],
save_total_limit=2,
load_best_model_at_end=True if eval_dataset else False,
metric_for_best_model="eval_loss",
greater_is_better=False,
fp16=False, # CPU doesn't support fp16
bf16=False,
dataloader_num_workers=0, # avoid multiprocessing issues
optim="adamw_torch",
torch_compile=False, # no speedup on CPU
)
# 6. Trainer (SFT)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
)
logger.info("Starting SFT training...")
trainer.train()
trainer.save_model() # saves LoRA adapters
# 7. Optional DPO training
if config["training"]["dpo"] and config["data"].get("preference_file"):
logger.info("Loading preference data for DPO...")
pref_dataset = load_dataset("json", data_files=config["data"]["preference_file"], split="train")
# For DPO we need base model without LoRA (or merged)
# We'll reload base model and then apply LoRA weights
# (Simplified: use the same model with LoRA attached; DPO trainer handles it)
dpo_trainer = DPOTrainer(
model=model,
ref_model=None, # uses model as reference (or you can provide a frozen copy)
args=training_args, # reuse same args (adjust for DPO)
train_dataset=pref_dataset,
tokenizer=tokenizer,
beta=config["training"]["dpo_beta"],
max_length=config["training"]["max_seq_length"],
max_prompt_length=256,
)
logger.info("Starting DPO training...")
dpo_trainer.train()
dpo_trainer.save_model(config["training"]["output_dir"] + "_dpo")
# 8. Merge LoRA and save full model (for conversion)
logger.info("Merging LoRA weights...")
merged_model = model.merge_and_unload()
merged_model.save_pretrained("models/merged")
tokenizer.save_pretrained("models/merged")
logger.info("Merged model saved to models/merged")
if __name__ == "__main__":
main()
Run training:
python train.py
Note: Training a 1B model on CPU with batch size 1 may take several hours to days depending on dataset size. Reduce epochs or dataset size for testing.
Step 6: Convert to GGUF
After training, we have a merged Hugging Face model in models/merged/. Now use llama.cpp's conversion script.
cd /home/kali/llama.cpp
python convert-hf-to-gguf.py /home/kali/microclaw/models/merged \
--outfile /home/kali/microclaw/models/microclaw.gguf \
--outtype q2_k # 2βbit quantization (extremely small)
For Raspberry Pi, q2_k is ideal. You can also try q3_k_s if you have more RAM.
Step 7: Build the FastAPI Server (server.py)
This server serves:
- Static files (the CLI dashboard) from the
static/folder. - API endpoints for inference, file management, cron, and RAG.
- SQLite database for conversation history and RAG cache.
#!/usr/bin/env python3
# server.py β FastAPI server with GGUF inference and static dashboard
import os
import yaml
import sqlite3
import json
from pathlib import Path
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import HTMLResponse, JSONResponse
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
from typing import Optional, List
import uvicorn
from llama_cpp import Llama
# Load config
with open("config.yaml") as f:
config = yaml.safe_load(f)
# Initialize SQLite
DB_PATH = config["rag"]["db_path"]
conn = sqlite3.connect(DB_PATH, check_same_thread=False)
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
prompt TEXT,
response TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS rag_cache (
id INTEGER PRIMARY KEY AUTOINCREMENT,
query TEXT UNIQUE,
chunks TEXT,
embedding BLOB
)
""")
conn.commit()
# Load GGUF model
model_path = config["server"]["model_path"]
llm = Llama(
model_path=model_path,
n_ctx=config["server"]["context_size"],
n_threads=os.cpu_count(),
n_gpu_layers=0, # CPU only
verbose=False,
)
app = FastAPI(title="microclaw Gateway")
# Mount static files
app.mount("/static", StaticFiles(directory="static"), name="static")
# API Models
class PromptRequest(BaseModel):
prompt: str
max_tokens: Optional[int] = 256
temperature: Optional[float] = 0.7
use_rag: Optional[bool] = False
class ToolRequest(BaseModel):
tool: str
args: dict
# Simple RAG (placeholder β you can enhance with embeddings)
def retrieve_chunks(query: str) -> str:
# For demo, just return static text; real implementation would use embeddings
return "Relevant document chunk about file management."
@app.get("/", response_class=HTMLResponse)
async def root():
with open("static/index.html") as f:
return f.read()
@app.post("/api/chat")
async def chat(req: PromptRequest):
# Optionally enhance prompt with RAG
if req.use_rag:
context = retrieve_chunks(req.prompt)
augmented_prompt = f"Context: {context}\n\nQuestion: {req.prompt}\nAnswer:"
else:
augmented_prompt = req.prompt
# Call model
output = llm(
augmented_prompt,
max_tokens=req.max_tokens,
temperature=req.temperature,
stop=["</s>", "###"],
echo=False
)
response = output["choices"][0]["text"].strip()
# Save to history
cursor.execute("INSERT INTO history (prompt, response) VALUES (?, ?)", (req.prompt, response))
conn.commit()
return {"response": response}
@app.get("/api/history")
async def get_history(limit: int = 50):
cursor.execute("SELECT prompt, response, timestamp FROM history ORDER BY timestamp DESC LIMIT ?", (limit,))
rows = cursor.fetchall()
return [{"prompt": r[0], "response": r[1], "timestamp": r[2]} for r in rows]
@app.post("/api/tool")
async def run_tool(req: ToolRequest):
# Example: execute system commands (sandboxed)
if req.tool == "ls":
path = req.args.get("path", ".")
try:
files = os.listdir(path)
return {"output": "\n".join(files)}
except Exception as e:
return {"error": str(e)}
elif req.tool == "cron_list":
# Parse crontab (requires user permissions)
# For demo, return placeholder
return {"output": "0 5 * * * /home/kali/backup.sh"}
else:
return {"error": "Unknown tool"}
if __name__ == "__main__":
uvicorn.run(app, host=config["server"]["host"], port=config["server"]["port"])
Step 8: Run the Server
cd /home/kali/microclaw
source venv/bin/activate
python server.py
Open your browser to http://localhost:8080 and start interacting.
Step 9: TRAIN.PY: use this for the train.py file:
#!/usr/bin/env python3
# train.py β CPUβonly fineβtuning with LoRA + optional DPO
import os
import yaml
import torch
from transformers import (
AutoTokenizer,
AutoModelForCausalLM,
TrainingArguments,
Trainer,
BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType
from datasets import load_dataset
from trl import DPOTrainer
import logging
# Load config
with open("config.yaml") as f:
config = yaml.safe_load(f)
# Setup logging
logging.basicConfig(level=logging.INFO, filename="logs/training.log", filemode="w")
logger = logging.getLogger(__name__)
def main():
# 1. Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(config["model"]["base_model_name"], cache_dir=config["model"]["cache_dir"])
tokenizer.pad_token = tokenizer.eos_token
# 2. Load base model in 8-bit (CPU offload not supported for bitsandbytes on CPU; we use standard dtype)
# For CPU, we load in float32 and rely on LoRA to reduce memory.
model = AutoModelForCausalLM.from_pretrained(
config["model"]["base_model_name"],
cache_dir=config["model"]["cache_dir"],
torch_dtype=torch.float32, # CPU uses float32
low_cpu_mem_usage=True
)
# 3. Prepare LoRA
if config["training"]["use_lora"]:
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=config["training"]["lora_r"],
lora_alpha=config["training"]["lora_alpha"],
lora_dropout=config["training"]["lora_dropout"],
target_modules=["q_proj", "v_proj"] # adjust for your model
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# 4. Load dataset
dataset = load_dataset("json", data_files=config["data"]["train_file"], split="train")
if config["data"].get("eval_file"):
eval_dataset = load_dataset("json", data_files=config["data"]["eval_file"], split="train")
else:
eval_dataset = None
# Format prompt: "### Instruction:\n{instruction}\n\n### Response:\n{response}"
def format_func(example):
text = f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['response']}{tokenizer.eos_token}"
return {"text": text}
dataset = dataset.map(format_func)
if eval_dataset:
eval_dataset = eval_dataset.map(format_func)
# Tokenize
def tokenize(element):
return tokenizer(element["text"], truncation=True, max_length=config["training"]["max_seq_length"], padding=False)
dataset = dataset.map(tokenize, remove_columns=dataset.column_names)
if eval_dataset:
eval_dataset = eval_dataset.map(tokenize, remove_columns=eval_dataset.column_names)
# 5. Training arguments (CPUβfriendly)
training_args = TrainingArguments(
output_dir=config["training"]["output_dir"],
per_device_train_batch_size=config["training"]["per_device_train_batch_size"],
gradient_accumulation_steps=config["training"]["gradient_accumulation_steps"],
learning_rate=config["training"]["learning_rate"],
num_train_epochs=config["training"]["num_train_epochs"],
logging_steps=config["training"]["logging_steps"],
save_steps=config["training"]["save_steps"],
evaluation_strategy="steps" if eval_dataset else "no",
eval_steps=config["training"]["save_steps"],
save_total_limit=2,
load_best_model_at_end=True if eval_dataset else False,
metric_for_best_model="eval_loss",
greater_is_better=False,
fp16=False, # CPU doesn't support fp16
bf16=False,
dataloader_num_workers=0, # avoid multiprocessing issues
optim="adamw_torch",
torch_compile=False, # no speedup on CPU
)
# 6. Trainer (SFT)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
)
logger.info("Starting SFT training...")
trainer.train()
trainer.save_model() # saves LoRA adapters
# 7. Optional DPO training
if config["training"]["dpo"] and config["data"].get("preference_file"):
logger.info("Loading preference data for DPO...")
pref_dataset = load_dataset("json", data_files=config["data"]["preference_file"], split="train")
# For DPO we need base model without LoRA (or merged)
# We'll reload base model and then apply LoRA weights
# (Simplified: use the same model with LoRA attached; DPO trainer handles it)
dpo_trainer = DPOTrainer(
model=model,
ref_model=None, # uses model as reference (or you can provide a frozen copy)
args=training_args, # reuse same args (adjust for DPO)
train_dataset=pref_dataset,
tokenizer=tokenizer,
beta=config["training"]["dpo_beta"],
max_length=config["training"]["max_seq_length"],
max_prompt_length=256,
)
logger.info("Starting DPO training...")
dpo_trainer.train()
dpo_trainer.save_model(config["training"]["output_dir"] + "_dpo")
# 8. Merge LoRA and save full model (for conversion)
logger.info("Merging LoRA weights...")
merged_model = model.merge_and_unload()
merged_model.save_pretrained("models/merged")
tokenizer.save_pretrained("models/merged")
logger.info("Merged model saved to models/merged")
if __name__ == "__main__":
main()
Run training:
python train.py
Note: Training a 1B model on CPU with batch size 1 may take several hours to days depending on dataset size. Reduce epochs or dataset size for testing.
Step 10: Convert to GGUF
After training, we have a merged Hugging Face model in models/merged/. Now use llama.cpp's conversion script.
cd /home/kali/llama.cpp
python convert-hf-to-gguf.py /home/kali/microclaw/models/merged \
--outfile /home/kali/microclaw/models/microclaw.gguf \
--outtype q2_k # 2βbit quantization (extremely small)
For Raspberry Pi, q2_k is ideal. You can also try q3_k_s if you have more RAM.
Step 11: Build the FastAPI Server (server.py)
This server serves:
- Static files (the CLI dashboard) from the
static/folder. - API endpoints for inference, file management, cron, and RAG.
- SQLite database for conversation history and RAG cache.
#!/usr/bin/env python3
# server.py β FastAPI server with GGUF inference and static dashboard
import os
import yaml
import sqlite3
import json
from pathlib import Path
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import HTMLResponse, JSONResponse
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
from typing import Optional, List
import uvicorn
from llama_cpp import Llama
# Load config
with open("config.yaml") as f:
config = yaml.safe_load(f)
# Initialize SQLite
DB_PATH = config["rag"]["db_path"]
conn = sqlite3.connect(DB_PATH, check_same_thread=False)
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE IF NOT EXISTS history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
prompt TEXT,
response TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
)
""")
cursor.execute("""
CREATE TABLE IF NOT EXISTS rag_cache (
id INTEGER PRIMARY KEY AUTOINCREMENT,
query TEXT UNIQUE,
chunks TEXT,
embedding BLOB
)
""")
conn.commit()
# Load GGUF model
model_path = config["server"]["model_path"]
llm = Llama(
model_path=model_path,
n_ctx=config["server"]["context_size"],
n_threads=os.cpu_count(),
n_gpu_layers=0, # CPU only
verbose=False,
)
app = FastAPI(title="microclaw Gateway")
# Mount static files
app.mount("/static", StaticFiles(directory="static"), name="static")
# API Models
class PromptRequest(BaseModel):
prompt: str
max_tokens: Optional[int] = 256
temperature: Optional[float] = 0.7
use_rag: Optional[bool] = False
class ToolRequest(BaseModel):
tool: str
args: dict
# Simple RAG (placeholder β you can enhance with embeddings)
def retrieve_chunks(query: str) -> str:
# For demo, just return static text; real implementation would use embeddings
return "Relevant document chunk about file management."
@app.get("/", response_class=HTMLResponse)
async def root():
with open("static/index.html") as f:
return f.read()
@app.post("/api/chat")
async def chat(req: PromptRequest):
# Optionally enhance prompt with RAG
if req.use_rag:
context = retrieve_chunks(req.prompt)
augmented_prompt = f"Context: {context}\n\nQuestion: {req.prompt}\nAnswer:"
else:
augmented_prompt = req.prompt
# Call model
output = llm(
augmented_prompt,
max_tokens=req.max_tokens,
temperature=req.temperature,
stop=["</s>", "###"],
echo=False
)
response = output["choices"][0]["text"].strip()
# Save to history
cursor.execute("INSERT INTO history (prompt, response) VALUES (?, ?)", (req.prompt, response))
conn.commit()
return {"response": response}
@app.get("/api/history")
async def get_history(limit: int = 50):
cursor.execute("SELECT prompt, response, timestamp FROM history ORDER BY timestamp DESC LIMIT ?", (limit,))
rows = cursor.fetchall()
return [{"prompt": r[0], "response": r[1], "timestamp": r[2]} for r in rows]
@app.post("/api/tool")
async def run_tool(req: ToolRequest):
# Example: execute system commands (sandboxed)
if req.tool == "ls":
path = req.args.get("path", ".")
try:
files = os.listdir(path)
return {"output": "\n".join(files)}
except Exception as e:
return {"error": str(e)}
elif req.tool == "cron_list":
# Parse crontab (requires user permissions)
# For demo, return placeholder
return {"output": "0 5 * * * /home/kali/backup.sh"}
else:
return {"error": "Unknown tool"}
if __name__ == "__main__":
uvicorn.run(app, host=config["server"]["host"], port=config["server"]["port"])
Step 12: Create the Retro CLI Dashboard
static/index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>microclaw v2026.2.21 β CLI Gateway</title>
<link rel="stylesheet" href="/static/style.css">
</head>
<body>
<div class="terminal">
<div class="header">microclaw [Version 2026.2.21] β Local Fallback Agent</div>
<div class="output" id="output">
<div>> System ready. Type a command or question.</div>
<div>> Use /help for available commands.</div>
</div>
<div class="input-line">
<span class="prompt">$></span>
<input type="text" id="input" autofocus>
</div>
</div>
<script src="/static/script.js"></script>
</body>
</html>
static/style.css
body {
background: #000;
color: #0f0;
font-family: 'Courier New', monospace;
margin: 0;
padding: 20px;
}
.terminal {
max-width: 900px;
margin: auto;
border: 2px solid #0f0;
padding: 10px;
height: 80vh;
display: flex;
flex-direction: column;
}
.header {
border-bottom: 1px solid #0f0;
padding-bottom: 5px;
margin-bottom: 10px;
text-align: center;
font-weight: bold;
}
.output {
flex: 1;
overflow-y: auto;
white-space: pre-wrap;
margin-bottom: 10px;
}
.input-line {
display: flex;
border-top: 1px solid #0f0;
padding-top: 5px;
}
.prompt {
margin-right: 5px;
}
#input {
background: #000;
border: none;
color: #0f0;
font-family: 'Courier New', monospace;
font-size: 1em;
flex: 1;
outline: none;
}
static/script.js
const input = document.getElementById('input');
const output = document.getElementById('output');
input.addEventListener('keydown', async (e) => {
if (e.key === 'Enter') {
const cmd = input.value.trim();
input.value = '';
addLine(`$> ${cmd}`);
await processCommand(cmd);
}
});
async function processCommand(cmd) {
if (cmd === '/help') {
addLine('Available commands:');
addLine(' /chat <question> β ask the model');
addLine(' /ls [path] β list files');
addLine(' /cron β show cron jobs');
addLine(' /history β show chat history');
addLine(' /clear β clear screen');
return;
}
if (cmd === '/clear') {
output.innerHTML = '';
return;
}
if (cmd.startsWith('/chat ')) {
const prompt = cmd.slice(6);
addLine('... thinking ...');
try {
const res = await fetch('/api/chat', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({prompt, use_rag: false})
});
const data = await res.json();
addLine(data.response);
} catch (err) {
addLine('Error: ' + err);
}
return;
}
if (cmd === '/history') {
try {
const res = await fetch('/api/history');
const history = await res.json();
history.forEach(item => {
addLine(`[${item.timestamp}] Q: ${item.prompt}`);
addLine(`A: ${item.response}`);
});
} catch (err) {
addLine('Error: ' + err);
}
return;
}
if (cmd.startsWith('/ls')) {
const parts = cmd.split(' ');
const path = parts[1] || '.';
try {
const res = await fetch('/api/tool', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({tool: 'ls', args: {path}})
});
const data = await res.json();
addLine(data.output || data.error);
} catch (err) {
addLine('Error: ' + err);
}
return;
}
if (cmd === '/cron') {
try {
const res = await fetch('/api/tool', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({tool: 'cron_list', args: {}})
});
const data = await res.json();
addLine(data.output || data.error);
} catch (err) {
addLine('Error: ' + err);
}
return;
}
addLine(`Unknown command: ${cmd}. Type /help.`);
}
function addLine(text) {
const line = document.createElement('div');
line.textContent = text;
output.appendChild(line);
output.scrollTop = output.scrollHeight;
}
You can add more pages (static/pages/files.html, static/pages/cron.html) and link them from the CLI using /open files commands, but for simplicity we'll keep the singleβpage CLI.
Step 13: Run the Server
cd /home/kali/microclaw
source venv/bin/activate
python server.py
Open your browser to http://localhost:8080 and start interacting.
Troubleshooting
Out of memory during training: Reduce
max_seq_length, batch size, or use a smaller base model (e.g.,Qwen2.5-0.5B).Slow inference: Ensure you compiled llama.cpp with OpenBLAS. Use fewer CPU threads if needed (
n_threads=4).GGUF conversion errors: Make sure you have the correct
transformersversion and that the merged model is saved properly.Model file not found: Ensure the path in the -m flag is correct. Use the absolute path.
Port already in use: Change the --port value (e.g., to 8001) and update your OpenClaw configuration.
Server starts but responds slowly: This is normal on CPU. You can try a smaller, more quantized GGUF variant from the Hugging Face repo (e.g., Q2_K for 2-bit).
If 'git lfs pull' fails or is slow: If downloads are interrupted, run the command againβit will resume.
OpenClaw cannot connect: Verify the server is running with curl (as in Step 4). Check any firewall rules. If OpenClaw is in a Docker container, ensure they are on the same network (using --network host for the OpenClaw container is the simplest solution).
If pkg-config is missing: which CMake uses to find some libraries.
Install it and rebuild:
sudo apt update
sudo apt install pkg-config
cd /home/kali/llama.cpp
rm -rf build # clean previous attempt
cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
cmake --build build --config Release -j$(nproc)
This should now complete successfully. If you still encounter issues, you can temporarily disable BLAS to get a working build:
cd /home/kali/llama.cpp
rm -rf build
cmake -B build
cmake --build build --config Release -j$(nproc)
After building, you'll have build/bin/llama-server and the conversion script convert-hf-to-gguf.py in the main llama.cpp directory.
The "illegal hardware instruction" error: indicates that the PyTorch build you're using is trying to execute CPU instructions (like AVX2) that your processor does not support. This is common on older CPUs or virtual machines. Let's diagnose and fix it.
1. Check your CPU's instruction set
Run this command to see what your CPU supports:
lscpu | grep -E "Model name|Flags"
Look for flags like avx, avx2, sse4_1, etc. If you don't see avx2, that's the problem.
2. Install a PyTorch version compatible with your CPU
The standard PyTorch wheels from the official site require AVX2. You have two options:
Option A: Install PyTorch from conda-forge (recommended)
Conda-forge often provides more compatible builds, including for older CPUs.
# Install Miniconda if you haven't
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda3
source ~/miniconda3/bin/activate
# Create a new environment with Python 3.10
conda create -y -n microclaw python=3.10
conda activate microclaw
# Install PyTorch CPU-only from conda-forge
conda install -y pytorch cpuonly -c pytorch # but this might also require AVX2
# Better: use conda-forge
conda install -y pytorch cpuonly -c conda-forge
If that still fails, we can try building PyTorch from source with older instruction sets, but that's complex.
Option B: Use the PyTorch wheels with no AVX requirements
There are community builds that target older CPUs. For example, the manylinux2014 wheels might work. Try:
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu --no-deps
But the official wheels likely require AVX2. You could try an older PyTorch version (e.g., 1.13) which may have broader support.
Option C: Use llama.cpp for training as well
Since llama.cpp is pure C++ and can be compiled for any CPU, you could use it for training too. However, we saw that the finetune tool wasn't present. But you can build it with the right flags. Let's try building the finetune example explicitly.
First, ensure you have the latest llama.cpp with the finetune example:
cd ~/llama.cpp
git pull origin master
Now build the finetune tool:
mkdir -p build && cd build
cmake .. -DLLAMA_FINETUNE=ON -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native"
make -j$(nproc) finetune
After that, the finetune binary should appear in build/bin/. Then you can train using the command we discussed earlier.
3. If you still want to use Python, consider using a smaller model
TinyLlama is already small, but the error is at the PyTorch level, not the model. So changing the model won't help. You need a compatible PyTorch.
4. Verify your Python environment
Maybe the virtual environment is using a system Python that has a broken PyTorch. Try creating a fresh venv with Python 3.10 and reinstalling all packages.
cd ~/microclaw
deactivate
rm -rf venv
python3.10 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
Then run python train.py again.
If none of these work, please provide the output of lscpu so we can tailor a solution.
LICENSE
Apache 2.0 License.
- Downloads last month
- 240
We're not able to determine the quantization variants.
Model tree for webxos/microclaw-for-openclaw-version-2026.2.17
Base model
openai-community/gpt2