Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 8 days ago • 182
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs Paper • 2605.09063 • Published 6 days ago • 75
SenseNova-U1 Collection SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture • 7 items • Updated 29 minutes ago • 61
HP-Edit: A Human-Preference Post-Training Framework for Image Editing Paper • 2604.19406 • Published 24 days ago • 7
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 23 days ago • 240
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges Paper • 2604.13602 • Published about 1 month ago • 32
Seeing Fast and Slow: Learning the Flow of Time in Videos Paper • 2604.21931 • Published 22 days ago • 19
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 18 days ago • 70
FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing Paper • 2604.22586 • Published 21 days ago • 16
Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation Paper • 2604.10030 • Published Apr 11 • 15
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published Apr 13 • 72
SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching Paper • 2602.24208 • Published Feb 27 • 7
Mode Seeking meets Mean Seeking for Fast Long Video Generation Paper • 2602.24289 • Published Feb 27 • 41
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper • 2602.14041 • Published Feb 15 • 53
Context Forcing: Consistent Autoregressive Video Generation with Long Context Paper • 2602.06028 • Published Feb 5 • 36