Instructions to use ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth")
model = AutoModelForImageTextToText.from_pretrained("ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth

SGLang

How to use ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth",
    max_seq_length=2048,
)

Docker Model Runner
How to use ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth with Docker Model Runner:
```
docker model run hf.co/ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Gemma4-E4B-Function-Calling-xLAM-Unsloth

This model is a fine-tuned version of Gemma4-E4B-it optimized for function calling using Unsloth for 2x faster training and 60% less VRAM.

Trained on the Salesforce/xlam-function-calling-60k dataset, which contains 60,000 function calling examples with queries, tool definitions, and structured answers.

Overview

Property	Value
Developed by	ermiaazarkhalili
License	GEMMA
Language	English
Base Model	Gemma4-E4B-it
Model Size	4B parameters
Training Framework	Unsloth + TRL
Training Method	SFT with QLoRA (4-bit)
Context Length	2,048 tokens
GGUF Available	Gemma4-E4B-Function-Calling-xLAM-Unsloth-GGUF

Training Configuration

SFT + LoRA Settings

Parameter	Value
Unsloth Class	`FastModel`
Chat Template	gemma-4
Learning Rate	2e-4
Batch Size	2 per device
Gradient Accumulation	4 steps
Effective Batch Size	8
Max Steps	1,000
Optimizer	AdamW 8-bit
LR Scheduler	Linear
Warmup Steps	5
Precision	Auto (BF16/FP16)
Gradient Checkpointing	Enabled (Unsloth optimized)
Seed	3407

LoRA Configuration

Parameter	Value
LoRA Rank (r)	16
LoRA Alpha	16
LoRA Dropout	0
Quantization	4-bit QLoRA
Target Modules	attention + MLP (via FastModel)

Dataset

Property	Value
Dataset	xLAM Function Calling 60K
Training Samples	60,000
Format	XML-tagged: `<query>`, `<tools>`, `<answers>`

Hardware

Property	Value
GPU	NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice)
Cluster	DRAC Fir (Compute Canada)
Execution	Papermill on SLURM

Usage

Quick Start (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Check if the numbers 8 and 1233 are powers of two."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Using with Unsloth (Fastest)

from unsloth import FastModel

model, tokenizer = FastModel.from_pretrained(
    "ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth",
    max_seq_length=2048,
    load_in_4bit=True,
)

4-bit Quantized Inference

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model = AutoModelForCausalLM.from_pretrained(
    "ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth",
    quantization_config=quantization_config,
    device_map="auto",
)

GGUF Versions

Quantized GGUF versions for CPU and edge inference are available at: Gemma4-E4B-Function-Calling-xLAM-Unsloth-GGUF

Format	Description
`Q4_K_M`	Recommended — good balance of quality and size
`Q5_K_M`	Higher quality, slightly larger
`Q8_0`	Near-lossless, largest GGUF size

Using with Ollama

ollama pull hf.co/ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M "Check if the numbers 8 and 1233 are powers of two."

Using with llama.cpp

./llama-cli -m Gemma4-E4B-Function-Calling-xLAM-Unsloth-Q4_K_M.gguf -p "Check if the numbers 8 and 1233 are powers of two." -n 512

Limitations

Language: Primarily trained on English data
Knowledge Cutoff: Limited to base model's training data cutoff
Hallucinations: May generate plausible-sounding but incorrect information
Context Length: Fine-tuned with 2,048 token context window
Safety: Not extensively safety-tuned; use with appropriate guardrails

Training Framework Versions

Package	Version
Unsloth	2026.4.4
TRL	0.24.0
Transformers	5.5.0
PyTorch	2.9.0
Datasets	4.3.0
PEFT	0.18.1
BitsAndBytes	0.49.2

Citation

@misc{ermiaazarkhalili_gemma4_e4b_function_calling_xlam_unsloth,
    author = {ermiaazarkhalili},
    title = {Gemma4-E4B-Function-Calling-xLAM-Unsloth: Fine-tuned Gemma4-E4B-it with Unsloth},
    year = {2026},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth}}
}

Acknowledgments

Unsloth for 2x faster fine-tuning
Base model developers (google)
Hugging Face TRL Team for the training library
Salesforce xLAM for the function calling dataset
Compute Canada / DRAC for HPC resources

Downloads last month: 14

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for ermiaazarkhalili/Gemma4-E4B-Function-Calling-xLAM-Unsloth

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Adapter

(107)

this model

Quantizations

1 model

ermiaazarkhalili
/

Gemma4-E4B-Function-Calling-xLAM-Unsloth