Chenlu123/grpo_qwen_qwen2_5_math_1_5b_guru_n8_bz512_mini_bz32_fsdp2_kl0.001 Updated about 16 hours ago
Chenlu123/grpo_qwen_qwen2_5_math_1_5b_guru_n8_bz512_mini_bz32_fsdp2_kl0.001 Updated about 16 hours ago
Chenlu123/shampoo_npg_tr_scale_delta20_lam1e-12_warmup_1_graftTrue_qwen2_5_math_1_5b Updated 2 days ago
Chenlu123/shampoo_npg_tr_scale_delta20_lam1e-12_warmup_1_graftTrue_qwen2_5_math_1_5b Updated 2 days ago
AgentSPEX: An Agent SPecification and EXecution Language Paper • 2604.13346 • Published 12 days ago • 153
Chenlu123/grpo_warmup_graftTrue_qwen2_5_math_1_5b_guru_n16_bz64_mini_bz64_global_step_80 Updated 18 days ago
Chenlu123/grpo_warmup_graftTrue_qwen2_5_math_1_5b_guru_n16_bz64_mini_bz64_global_step_80 Updated 18 days ago
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL Paper • 2603.19470 • Published Mar 19 • 3
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL Paper • 2603.19470 • Published Mar 19 • 3
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step460 2B • Updated Mar 20 • 4
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step460 2B • Updated Mar 20 • 4
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step440 2B • Updated Mar 20 • 4
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step440 2B • Updated Mar 20 • 4
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step420 2B • Updated Mar 20 • 4
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step420 2B • Updated Mar 20 • 4
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step400 2B • Updated Mar 20 • 4
Chenlu123/teacher_Qwen3-4B_dapo-math-17k_n8_prompt_bsz_128_mini_bsz_32_step400 2B • Updated Mar 20 • 4