Kernels
optimizer / torch-ext

Commit History

feat: extend QK-Clip to support MLA (MuonClip Algorithm 1) [skip-build] (#28)
e8e2c81
unverified

dongseokmotif Claude Sonnet 4.6 wyldecat github-actions[bot] commited on

Revert "fix: disable CUDA graphs in Newton-Schulz for cpu_offload compatibility" (#29)
313d56a
unverified

wyldecat github-actions[bot] commited on

fix: disable CUDA graphs in Newton-Schulz for cpu_offload compatibility
2dce952

wyldecat Claude Opus 4.6 (1M context) commited on

Replace cpu_offload constructor param with turn_on/turn_off API (#26)
05a75f1
unverified

wyldecat Claude Opus 4.6 (1M context) github-actions[bot] commited on

Invalidate AdamW tensor caches on load_state_dict [skip-build]
89b6099

ca1207 Claude Opus 4.6 (1M context) commited on

draft commit for cpu_offload (#23)
10848ab
unverified

TaehyunKim github-actions[bot] wyldecat Claude Opus 4.6 (1M context) commited on

Update fast path comment to reflect current behavior [skip-build]
7e33533

wyldecat Claude Opus 4.6 commited on

Update comment to reflect use_local_synchronization behavior [skip-build]
3f5cf49

wyldecat Claude Opus 4.6 commited on

Fix deadlock in construct_shard_mesh with PP + dp_replicate > 1
da7e5da

wyldecat Claude Opus 4.6 commited on

Muon optimizer: expert batching, parallel caching, A2A overlap [skip-build]
0f37d63

wyldecat Claude Opus 4.6 commited on

Optimize pipeline: batched update, zero-copy scatter, prelaunch gather [skip-build]
2816b64

wyldecat Claude Opus 4.6 commited on

Cache AdamW placement grouping and tensor lists [skip-build]
8ca2492

wyldecat Claude Opus 4.6 commited on

Add torch.compile, CUDA graph, and compiled momentum [skip-build]
e74d98f

wyldecat Claude Opus 4.6 commited on

Add mhc_attn, mhc_ffn, lambda_proj to skip_keys
ba293d0

wyldecat Claude Opus 4.6 commited on

Remove verbose param_groups summary logging
24f0957

wyldecat Claude Opus 4.6 commited on

Support multi-component expert_keys (e.g. "experts.w1")
5a99e12

wyldecat Claude Opus 4.6 commited on

Extract is_expert_param() helper to consolidate expert key matching
e615b1c

wyldecat Claude Opus 4.6 commited on

Include original (pre-normalize) FQN in is_muon logging
135fc66

wyldecat Claude Opus 4.6 commited on

Add info-level logging for param group classification (Muon vs AdamW)
1118752

wyldecat Claude Opus 4.6 commited on

Use component-level matching for expert_keys to avoid shared_experts collision
f008017

wyldecat Claude Opus 4.6 commited on

Normalize parameter FQNs to handle torch.compile / checkpoint wrappers
95a620f

wyldecat Claude Opus 4.6 commited on

Apply pre-commit formatting (yapf) [skip-build]
bf30b9b

dongseokmotif Claude Sonnet 4.6 commited on

Add max_iter cap and non-finite checks to _optimal_quintic [skip-build]
206b280

dongseokmotif commited on

Apply pre-commit formatting (yapf, isort) [skip-build]
aff01db

dongseokmotif commited on

Add comment explaining _coeffs_list and Polar Express vs former NS [skip-build]
abaa449

dongseokmotif Claude Sonnet 4.6 commited on

Replace hardcoded NS coefficients with analytically optimal ones [skip-build]
573242f

dongseokmotif Claude Sonnet 4.6 commited on

Refactor pipeline to async generator pattern (#16)
33929c0
unverified

wyldecat github-actions[bot] commited on

Support mHC (#15)
ae32572
unverified

wyldecat github-actions[bot] commited on

Support param group with various placements (#13)
e2b41e5
unverified

wyldecat github-actions[bot] commited on

fix bug in fsdp
811726c

ca1207 commited on

Update torch-ext/optimizer/muon.py
b0230e7
unverified

TaehyunKim commited on

Update torch-ext/optimizer/muon.py
ff2fcfb
unverified

TaehyunKim commited on

Update muon.py
c16b438
unverified

TaehyunKim commited on

fix assert in a2a gather scatter
3dafb3e

ca1207 commited on

delete state in split_func
15336dc

ca1207 commited on

change owner_params to owned_params
6943c45

ca1207 commited on

modify pre step (overlap step) can get from arsgs
589b763

ca1207 commited on

add doc strings + init self rank on init_assign_params
267e8a0

ca1207 commited on

license added for flash_muon
d7cd571

ca1207 commited on

apply pre-commit hook
fceb334

dongseokmotif commited on

consider multi node
39c42e0

dongseokmotif commited on

misc
35894d1

ca1207 commited on

use inpalce op in update_g
6e9baad

ca1207 commited on

use COMM_DTYPE instead of hardcoded dtype
2a8631f

ca1207 commited on

apply all2all scatter gather
ff6d675

ca1207 commited on

feat(muon_clip) : add muon clip (#6)
d65066c
unverified

dongseokmotif dongseokmotif github-actions[bot] commited on

feat(muon) : add tuned-abc-values & blfoat16 communication
f7faa93

wyldecat commited on

feat: update muon to receive paramgroups, not model (#4)
b0f46c7
unverified

leejunhyeok junhyeok.lee wyldecat commited on

fix(muon): add update_p stage and dealloc tensors properly
99e7c0c

wyldecat commited on

chore: add .gitignore
79fc8ba

wyldecat commited on