Gemma4-E4B-Function-Calling-xLAM-Unsloth

This model is a fine-tuned version of Gemma4-E4B-it optimized for function calling using Unsloth for 2x faster training and 60% less VRAM.

Trained on the Salesforce/xlam-function-calling-60k dataset, which contains 60,000 function calling examples with queries, tool definitions, and structured answers.

Overview

Property Value
Developed by ermiaazarkhalili
License GEMMA
Language English
Base Model Gemma4-E4B-it
Model Size 4B parameters
Training Framework Unsloth + TRL
Training Method SFT with QLoRA (4-bit)
Context Length 2,048 tokens
GGUF Available Gemma4-E4B-Function-Calling-xLAM-Unsloth-GGUF

Training Configuration

SFT + LoRA Settings

Parameter Value
Unsloth Class FastModel
Chat Template gemma-4
Learning Rate 2e-4
Batch Size 2 per device
Gradient Accumulation 4 steps
Effective Batch Size 8
Max Steps 1,000
Optimizer AdamW 8-bit
LR Scheduler Linear
Warmup Steps 5
Precision Auto (BF16/FP16)
Gradient Checkpointing Enabled (Unsloth optimized)
Seed 3407

LoRA Configuration

Parameter Value
LoRA Rank (r) 16
LoRA Alpha 16
LoRA Dropout 0
Quantization 4-bit QLoRA
Target Modules attention + MLP (via FastModel)

Dataset

Property Value
Dataset xLAM Function Calling 60K
Training Samples 60,000
Format XML-tagged: <query>, <tools>, <answers>

Hardware

Property Value
GPU NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice)
Cluster DRAC Fir (Compute Canada)
Execution Papermill on SLURM

Usage

Quick Start (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Check if the numbers 8 and 1233 are powers of two."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Using with Unsloth (Fastest)

from unsloth import FastModel

model, tokenizer = FastModel.from_pretrained(
    "ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth",
    max_seq_length=2048,
    load_in_4bit=True,
)

4-bit Quantized Inference

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model = AutoModelForCausalLM.from_pretrained(
    "ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth",
    quantization_config=quantization_config,
    device_map="auto",
)

GGUF Versions

Quantized GGUF versions for CPU and edge inference are available at: Gemma4-E4B-Function-Calling-xLAM-Unsloth-GGUF

Format Description
Q4_K_M Recommended — good balance of quality and size
Q5_K_M Higher quality, slightly larger
Q8_0 Near-lossless, largest GGUF size

Using with Ollama

ollama pull hf.co/ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M "Check if the numbers 8 and 1233 are powers of two."

Using with llama.cpp

./llama-cli -m Gemma4-E4B-Function-Calling-xLAM-Unsloth-Q4_K_M.gguf -p "Check if the numbers 8 and 1233 are powers of two." -n 512

Limitations

  • Language: Primarily trained on English data
  • Knowledge Cutoff: Limited to base model's training data cutoff
  • Hallucinations: May generate plausible-sounding but incorrect information
  • Context Length: Fine-tuned with 2,048 token context window
  • Safety: Not extensively safety-tuned; use with appropriate guardrails

Training Framework Versions

Package Version
Unsloth 2026.4.4
TRL 0.24.0
Transformers 5.5.0
PyTorch 2.9.0
Datasets 4.3.0
PEFT 0.18.1
BitsAndBytes 0.49.2

Citation

@misc{ermiaazarkhalili_gemma4_e4b_function_calling_xlam_unsloth,
    author = {ermiaazarkhalili},
    title = {Gemma4-E4B-Function-Calling-xLAM-Unsloth: Fine-tuned Gemma4-E4B-it with Unsloth},
    year = {2026},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth}}
}

Acknowledgments

Downloads last month
14
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth

Adapter
(107)
this model
Quantizations
1 model

Dataset used to train ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth