File size: 5,105 Bytes
1609fa1 f35e476 dcf46c9 1609fa1 f35e476 d2764ef 1609fa1 f35e476 1609fa1 dcf46c9 d2764ef f35e476 1609fa1 f35e476 dcf46c9 1609fa1 f35e476 1609fa1 dcf46c9 f35e476 1609fa1 f35e476 1609fa1 f35e476 1609fa1 f35e476 1609fa1 f35e476 1609fa1 f35e476 dcf46c9 f35e476 15551ba f35e476 dcf46c9 f35e476 1609fa1 f35e476 1609fa1 dcf46c9 f35e476 dcf46c9 f35e476 dcf46c9 f35e476 dcf46c9 f35e476 1609fa1 f35e476 1609fa1 f35e476 1609fa1 f35e476 dcf46c9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | ---
license: apache-2.0
tags:
- code
- qwen3
- finetuned
- python
- competitive-programming
base_model: Qwen/Qwen3-4B
datasets:
- microsoft/rStar-Coder
language:
- en
pipeline_tag: text-generation
model-index:
- name: qwen3-4b-code-finetuned
results:
- task:
type: text-generation
name: Code Generation
dataset:
name: HumanEval
type: openai_humaneval
metrics:
- name: pass@1
type: pass@1
value: 68.9
verified: false
- task:
type: text-generation
name: Code Generation
dataset:
name: HumanEval+
type: evalplus/humanevalplus
metrics:
- name: pass@1
type: pass@1
value: 64.0
verified: false
- task:
type: text-generation
name: Code Generation
dataset:
name: MBPP
type: mbpp
metrics:
- name: pass@1
type: pass@1
value: 58.2
verified: false
- task:
type: text-generation
name: Code Generation
dataset:
name: MBPP+
type: evalplus/mbppplus
metrics:
- name: pass@1
type: pass@1
value: 50.8
verified: false
---
# Qwen3-4B Code Fine-Tuned
Fine-tuned Qwen3-4B on 10K verified reasoning traces from rStar-Coder (1 epoch SFT).
**Optimized for algorithmic/competitive programming tasks.**
## π Performance (EvalPlus Framework)
| Benchmark | Base | Plus | vs Base Model |
|-----------|------|------|---------------|
| **HumanEval** | **68.9%** | **64.0%** | **+6.9%** β
|
| **MBPP** | **58.2%** | **50.8%** | **-8.8%** β οΈ |
*Evaluated using [EvalPlus](https://github.com/evalplus/evalplus) with greedy decoding*
### Performance Trade-off
- β
**Improved on complex algorithmic tasks** (HumanEval: 62% β 68.9%)
- β οΈ **Regression on simple practical tasks** (MBPP: 67% β 58.2%)
**Why?** Trained on competition-style problems (LeetCode, Codeforces) which emphasizes algorithmic reasoning over simple utility functions.
**Use this model if:** You need help with algorithms, data structures, competitive programming
**Use base model if:** You need simple utility functions, basic string/list operations
## π Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"prometheus04/qwen3-4b-code-finetuned",
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("prometheus04/qwen3-4b-code-finetuned", trust_remote_code=True)
# Complete a function
messages = [
{"role": "system", "content": "You are a programming expert."},
{"role": "user", "content": "def fibonacci(n):\n "}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## π Training Details
- **Base Model**: Qwen/Qwen3-4B (4B parameters)
- **Dataset**: microsoft/rStar-Coder synthetic_sft (10K samples)
- Competition problems from LeetCode, Codeforces, etc.
- Execution-verified solutions with reasoning traces
- **Method**: LoRA fine-tuning
- Rank: 32
- Alpha: 64
- Target modules: All linear layers (q,k,v,o,gate,up,down)
- rsLoRA: Enabled
- **Training**:
- Epochs: 1
- Batch size: 2 Γ 8 grad accum = 16 effective
- Learning rate: 2e-4 (cosine schedule)
- Optimizer: AdamW 8-bit
- Max seq length: 4096
## π‘ Key Features
β
Trained on execution-verified competition solutions
β
Curriculum learning (easy β hard)
β
Decontaminated from HumanEval/MBPP
β
Efficient LoRA (1.62% trainable params)
β
Production-ready merged weights
## π Comparison
| Model | HumanEval | MBPP | Specialization |
|-------|-----------|------|----------------|
| Qwen3-4B Base | 62% | 67% | General |
| **This Model** | **68.9%** | 58.2% | **Algorithms** |
| GPT-3.5-turbo | ~75% | ~70% | General |
## π― Strengths
- Binary search, dynamic programming, graph algorithms
- Recursion, backtracking, tree traversal
- Complex data structure manipulation
- Competitive programming patterns
## β οΈ Limitations
- **Not recommended for simple utility functions** (use base model instead)
- Trained on Python-only data
- May overthink simple problems
- Best for algorithmic/competitive programming tasks
- Optimal for functions <4K tokens
## π§ Recommended Use Cases
β
LeetCode/HackerRank style problems
β
Algorithm implementation
β
Data structure coding
β
Competitive programming practice
β
Technical interview preparation
β Simple string manipulation
β Basic list operations
β Trivial utility functions
## π Citation
```bibtex
@misc{qwen3-4b-code-finetuned,
author = {prometheus04},
title = {Qwen3-4B Code Fine-Tuned on rStar-Coder},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/prometheus04/qwen3-4b-code-finetuned}},
}
```
## π License
Apache 2.0 (inherited from base model) |