FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation
Paper โข 2509.25187 โข Published โข 3
How to use yunyangge/FlashI2V-1.3B with Diffusers:
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("yunyangge/FlashI2V-1.3B", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]
Comming soon...
| Model | I2V Paradigm | Subject Consistencyโ | Background Consistencyโ | Motion Smoothnessโ | Dynamic Degreeโ | Aesthetic Qualityโ | Imaging Qualityโ | I2V Subject Consistencyโ | I2V Background Consistencyโ |
|---|---|---|---|---|---|---|---|---|---|
| SVD-XT-1.0 (1.5B) | Repeating Concat and Adding Noise | 95.52 | 96.61 | 98.09 | 52.36 | 60.15 | 69.80 | 97.52 | 97.63 |
| SVD-XT-1.1 (1.5B) | Repeating Concat and Adding Noise | 95.42 | 96.77 | 98.12 | 43.17 | 60.23 | 70.23 | 97.51 | 97.62 |
| SEINE-512x512 (1.8B) | Inpainting | 95.28 | 97.12 | 97.12 | 27.07 | 64.55 | 71.39 | 97.15 | 96.94 |
| CogVideoX-5B-I2V | Zero-padding Concat and Adding Noise | 94.34 | 96.42 | 98.40 | 33.17 | 61.87 | 70.01 | 97.19 | 96.74 |
| Wan2.1-I2V-14B-720P | Inpainting | 94.86 | 97.07 | 97.90 | 51.38 | 64.75 | 70.44 | 96.95 | 96.44 |
| CogVideoX1.5-5B-I2Vโ | Zero-padding Concat and Adding Noise | 95.04 | 96.52 | 98.47 | 37.48 | 62.68 | 70.99 | 97.78 | 98.73 |
| Wan2.1-I2V-14B-480Pโ | Inpainting | 95.68 | 97.44 | 98.46 | 45.20 | 61.44 | 70.37 | 97.83 | 99.08 |
| FlashI2Vโ (1.3B) | FlashI2V | 95.13 | 96.36 | 98.35 | 53.01 | 62.34 | 69.41 | 97.67 | 98.72 |
โ means testing with recaptioned text-image-pairs in Vbench-I2V.
If you want to cite our work, please follow:
@misc{ge2025flashi2v,
title={FlashI2V: Fourier-Guided Latent Shifting Prevents Conditional Image Leakage in Image-to-Video Generation},
author={Yunyang Ge and Xinhua Cheng and Chengshu Zhao and Xianyi He and Shenghai Yuan and Bin Lin and Bin Zhu and Li Yuan},
year={2025},
eprint={2509.25187},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.25187},
}