Commit History
Revert "fix: disable CUDA graphs in Newton-Schulz for cpu_offload compatibility" (#29) 313d56a unverified
fix: disable CUDA graphs in Newton-Schulz for cpu_offload compatibility 2dce952
Replace cpu_offload constructor param with turn_on/turn_off API (#26) 05a75f1 unverified
Invalidate AdamW tensor caches on load_state_dict [skip-build] 89b6099
draft commit for cpu_offload (#23) 10848ab unverified
Update fast path comment to reflect current behavior [skip-build] 7e33533
Update comment to reflect use_local_synchronization behavior [skip-build] 3f5cf49
Fix deadlock in construct_shard_mesh with PP + dp_replicate > 1 da7e5da
Muon optimizer: expert batching, parallel caching, A2A overlap [skip-build] 0f37d63
Optimize pipeline: batched update, zero-copy scatter, prelaunch gather [skip-build] 2816b64
Cache AdamW placement grouping and tensor lists [skip-build] 8ca2492
Add torch.compile, CUDA graph, and compiled momentum [skip-build] e74d98f
Add mhc_attn, mhc_ffn, lambda_proj to skip_keys ba293d0
Remove verbose param_groups summary logging 24f0957
Support multi-component expert_keys (e.g. "experts.w1") 5a99e12
Extract is_expert_param() helper to consolidate expert key matching e615b1c
Include original (pre-normalize) FQN in is_muon logging 135fc66
Add info-level logging for param group classification (Muon vs AdamW) 1118752
Use component-level matching for expert_keys to avoid shared_experts collision f008017
Normalize parameter FQNs to handle torch.compile / checkpoint wrappers 95a620f
Apply pre-commit formatting (yapf) [skip-build] bf30b9b
Add max_iter cap and non-finite checks to _optimal_quintic [skip-build] 206b280
Apply pre-commit formatting (yapf, isort) [skip-build] aff01db
Add comment explaining _coeffs_list and Polar Express vs former NS [skip-build] abaa449
Replace hardcoded NS coefficients with analytically optimal ones [skip-build] 573242f
Refactor pipeline to async generator pattern (#16) 33929c0 unverified
Support mHC (#15) ae32572 unverified
Support param group with various placements (#13) e2b41e5 unverified
fix bug in fsdp 811726c
Update torch-ext/optimizer/muon.py b0230e7 unverified
TaehyunKim commited on
Update torch-ext/optimizer/muon.py ff2fcfb unverified
TaehyunKim commited on
Update muon.py c16b438 unverified
TaehyunKim commited on