video-analysis
updated
VideoITG: Multimodal Video Understanding with Instructed Temporal
Grounding
Paper
• 2507.13353
• Published
• 1
Kwai Keye-VL Technical Report
Paper
• 2507.01949
• Published
• 131
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New
Benchmarks
Paper
• 2507.11336
• Published
• 7
Attention is all you need for Videos: Self-attention based Video
Summarization using Universal Transformers
Paper
• 1906.02792
• Published
Rethinking the Evaluation of Video Summaries
Paper
• 1903.11328
• Published
Self-supervised pre-training and contrastive representation learning for
multiple-choice video QA
Paper
• 2009.08043
• Published
Video Representation Learning with Visual Tempo Consistency
Paper
• 2006.15489
• Published
Video Representation Learning by Recognizing Temporal Transformations
Paper
• 2007.10730
• Published
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Paper
• 2112.09583
• Published
Paper
• 2106.13230
• Published
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
Paper
• 2212.14546
• Published
Video-Text Retrieval by Supervised Sparse Multi-Grained Learning
Paper
• 2302.09473
• Published
• 1
Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding
in Long Videos
Paper
• 2303.08345
• Published
Efficient Semantic Segmentation by Altering Resolutions for Compressed
Videos
Paper
• 2303.07224
• Published
VideoXum: Cross-modal Visual and Textural Summarization of Videos
Paper
• 2303.12060
• Published
Open-vocabulary Video Question Answering: A New Benchmark for Evaluating
the Generalizability of Video Question Answering Models
Paper
• 2308.09363
• Published
Multi-event Video-Text Retrieval
Paper
• 2308.11551
• Published
Video ReCap: Recursive Captioning of Hour-Long Videos
Paper
• 2402.13250
• Published
• 26
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of
Multi-modal LLMs in Video Analysis
Paper
• 2405.21075
• Published
• 26
Video Editing for Video Retrieval
Paper
• 2402.02335
• Published
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal
Models for Video Question Answering
Paper
• 2401.10711
• Published
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt
Instruction Tuning
Paper
• 2404.12353
• Published
Video Captioning with Aggregated Features Based on Dual Graphs and Gated
Fusion
Paper
• 2308.06685
• Published
• 1
Moving Object Based Collision-Free Video Synopsis
Paper
• 2401.02419
• Published
• 1
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Paper
• 2403.10228
• Published
• 1
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating
Video-based Large Language Models
Paper
• 2311.16103
• Published
• 1
Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain
Adaptation
Paper
• 2312.00220
• Published
• 1
Conditional Modeling Based Automatic Video Summarization
Paper
• 2311.12159
• Published
• 1
Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality
Virtual Try-on in Videos
Paper
• 2404.17571
• Published
• 1
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Paper
• 2403.13347
• Published
• 1
Boost Video Frame Interpolation via Motion Adaptation
Paper
• 2306.13933
• Published
• 3
VcLLM: Video Codecs are Secretly Tensor Codecs
Paper
• 2407.00467
• Published
• 2
Video Understanding with Large Language Models: A Survey
Paper
• 2312.17432
• Published
• 3
Video Mamba Suite: State Space Model as a Versatile Alternative for
Video Understanding
Paper
• 2403.09626
• Published
• 15
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and
Language Models
Paper
• 2306.05424
• Published
• 7
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Paper
• 2311.13435
• Published
• 18
Towards Retrieval Augmented Generation over Large Video Libraries
Paper
• 2406.14938
• Published
• 22