---
license: apache-2.0
tags:
- code
- qwen3
- finetuned
- python
- competitive-programming
base_model: Qwen/Qwen3-4B
datasets:
- microsoft/rStar-Coder
language:
- en
pipeline_tag: text-generation
model-index:
- name: qwen3-4b-code-finetuned
  results:
  - task:
      type: text-generation
      name: Code Generation
    dataset:
      name: HumanEval
      type: openai_humaneval
    metrics:
    - name: pass@1
      type: pass@1
      value: 68.9
      verified: false
  - task:
      type: text-generation
      name: Code Generation
    dataset:
      name: HumanEval+
      type: evalplus/humanevalplus
    metrics:
    - name: pass@1
      type: pass@1
      value: 64.0
      verified: false
  - task:
      type: text-generation
      name: Code Generation
    dataset:
      name: MBPP
      type: mbpp
    metrics:
    - name: pass@1
      type: pass@1
      value: 58.2
      verified: false
  - task:
      type: text-generation
      name: Code Generation
    dataset:
      name: MBPP+
      type: evalplus/mbppplus
    metrics:
    - name: pass@1
      type: pass@1
      value: 50.8
      verified: false
---

# Qwen3-4B Code Fine-Tuned

Fine-tuned Qwen3-4B on 10K verified reasoning traces from rStar-Coder (1 epoch SFT).

**Optimized for algorithmic/competitive programming tasks.**


## 📊 Performance (EvalPlus Framework)

| Benchmark | Base | Plus | vs Base Model |
|-----------|------|------|---------------|
| **HumanEval** | **68.9%** | **64.0%** | **+6.9%** ✅ |
| **MBPP** | **58.2%** | **50.8%** | **-8.8%** ⚠️ |

*Evaluated using [EvalPlus](https://github.com/evalplus/evalplus) with greedy decoding*

### Performance Trade-off

- ✅ **Improved on complex algorithmic tasks** (HumanEval: 62% → 68.9%)
- ⚠️ **Regression on simple practical tasks** (MBPP: 67% → 58.2%)

**Why?** Trained on competition-style problems (LeetCode, Codeforces) which emphasizes algorithmic reasoning over simple utility functions.

**Use this model if:** You need help with algorithms, data structures, competitive programming  
**Use base model if:** You need simple utility functions, basic string/list operations

## 🚀 Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "prometheus04/qwen3-4b-code-finetuned",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained("prometheus04/qwen3-4b-code-finetuned", trust_remote_code=True)

# Complete a function
messages = [
    {"role": "system", "content": "You are a programming expert."},
    {"role": "user", "content": "def fibonacci(n):\n    "}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## 📝 Training Details

- **Base Model**: Qwen/Qwen3-4B (4B parameters)
- **Dataset**: microsoft/rStar-Coder synthetic_sft (10K samples)
  - Competition problems from LeetCode, Codeforces, etc.
  - Execution-verified solutions with reasoning traces
- **Method**: LoRA fine-tuning
  - Rank: 32
  - Alpha: 64
  - Target modules: All linear layers (q,k,v,o,gate,up,down)
  - rsLoRA: Enabled
- **Training**:
  - Epochs: 1
  - Batch size: 2 × 8 grad accum = 16 effective
  - Learning rate: 2e-4 (cosine schedule)
  - Optimizer: AdamW 8-bit
  - Max seq length: 4096


## 💡 Key Features

✅ Trained on execution-verified competition solutions  
✅ Curriculum learning (easy → hard)  
✅ Decontaminated from HumanEval/MBPP  
✅ Efficient LoRA (1.62% trainable params)  
✅ Production-ready merged weights  

## 📈 Comparison

| Model | HumanEval | MBPP | Specialization |
|-------|-----------|------|----------------|
| Qwen3-4B Base | 62% | 67% | General |
| **This Model** | **68.9%** | 58.2% | **Algorithms** |
| GPT-3.5-turbo | ~75% | ~70% | General |

## 🎯 Strengths

- Binary search, dynamic programming, graph algorithms
- Recursion, backtracking, tree traversal
- Complex data structure manipulation
- Competitive programming patterns

## ⚠️ Limitations

- **Not recommended for simple utility functions** (use base model instead)
- Trained on Python-only data
- May overthink simple problems
- Best for algorithmic/competitive programming tasks
- Optimal for functions <4K tokens

## 🔧 Recommended Use Cases

✅ LeetCode/HackerRank style problems  
✅ Algorithm implementation  
✅ Data structure coding  
✅ Competitive programming practice  
✅ Technical interview preparation  

❌ Simple string manipulation  
❌ Basic list operations  
❌ Trivial utility functions  

## 📄 Citation
```bibtex
@misc{qwen3-4b-code-finetuned,
  author = {prometheus04},
  title = {Qwen3-4B Code Fine-Tuned on rStar-Coder},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/prometheus04/qwen3-4b-code-finetuned}},
}
```

## 📜 License

Apache 2.0 (inherited from base model)