ZeroGPU Explorers

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

manchery authored a paper 5 days ago

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

KBlueLeaf authored a paper 12 days ago

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

omer11a submitted a paper 13 days ago

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

View all activity

Jinfa

authored a paper 18 days ago

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Paper • 2603.23483 • Published 19 days ago • 61

Jinfa

submitted a paper to Daily Papers 19 days ago

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Paper • 2603.23483 • Published 19 days ago • 61

Jinfa

authored a paper 26 days ago

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

Paper • 2603.16859 • Published 26 days ago • 248

Jinfa

submitted a paper to Daily Papers 26 days ago

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

Paper • 2603.16859 • Published 26 days ago • 248

BestWishYsh

authored a paper about 1 month ago

Helios: Real Real-Time Long Video Generation Model

Paper • 2603.04379 • Published Mar 4 • 186

BestWishYsh

posted an update about 1 month ago

Post

3520

🚀 Introducing Helios: a 14B real-time long-video generation model!

It’s completely wild—faster than 1.3B models and achieves this without using self-forcing. Welcome to the new era of video generation! 😎👇

💻 Code: https://github.com/PKU-YuanGroup/Helios
🏠 Page: https://pku-yuangroup.github.io/Helios-Page
📄 Paper: Helios: Real Real-Time Long Video Generation Model (2603.04379)

🔹 True Single-GPU Extreme Speed ⚡️
No need to rely on traditional workarounds like KV-cache, quantization, sparse/linear attention, or TinyVAE. Helios hits an end-to-end 19.5 FPS on a single H100!

Training is also highly accessible: an 80GB VRAM can fit four 14B models.

🔹 Solving Long-Video "Drift" from the Core 🎥
Tired of visual drift and repetitive loops? We ditched traditional hacks (like error banks, self-forcing, or keyframe sampling).

Instead, our innovative training strategy simulates & eliminates drift directly, keeping minute-long videos incredibly coherent with stunning quality. ✨

🔹 3 Model Variants for Full Coverage 🛠️
With a unified architecture natively supporting T2V, I2V, and V2V, we are open-sourcing 3 flavors:

1️⃣ Base: Single-stage denoising for extreme high-fidelity.
2️⃣ Mid: Pyramid denoising + CFG-Zero for the perfect balance of quality & throughput.
3️⃣ Distilled: Adversarial Distillation (DMD) for ultra-fast, few-step generation.

🔹 Day-0 Ecosystem Ready 🌍
We wanted deployment to be a breeze from the second we launched. Helios drops with comprehensive Day-0 hardware and framework support:

✅ Huawei Ascend-NPU
✅ HuggingFace Diffusers
✅ vLLM-Omni
✅ SGLang-Diffusion

Try it out and let us know what you think!

6 replies

Zhengyi

authored a paper about 2 months ago

NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks

Paper • 2510.15019 • Published Oct 16, 2025 • 65

Taishi-N324

authored a paper about 2 months ago

On the Optimal Reasoning Length for RL-Trained Language Models

Paper • 2602.09591 • Published Feb 10 • 6

wangfuyun

authored a paper 2 months ago

PromptRL: Prompt Matters in RL for Flow-Based Image Generation

Paper • 2602.01382 • Published Feb 1 • 9

wangfuyun

posted an update 2 months ago

Post

1895

PromptRL: Language Models as Co-Learners in Flow-Based Image Generation RL 🚀

We found two critical failure modes in flow-based RL:
1️⃣ Quality-Diversity Dilemma: High-quality models produce similar outputs, bottlenecking RL exploration
2️⃣ Prompt Linguistic Hacking: Models overfit to surface patterns—paraphrase the prompt and performance tanks

Solution: **Jointly train LM + FM** — the LM dynamically generates semantically-consistent but diverse prompt variants

📊 Results:
• GenEval: 0.97
• OCR accuracy: 0.98
• PickScore: 24.05
• 2×+ fewer rollouts than flow-only RL

Paper: arxiv.org/abs/2602.01382
Code: github.com/G-U-N/UniRL

#AI #TextToImage #ReinforcementLearning #Diffusion