video-analysis - a prisar Collection

prisar 's Collections

model-architecture

context-engineering

video-analysis

updated Aug 12, 2025

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding

Paper • 2507.13353 • Published Jul 17, 2025 • 1
Kwai Keye-VL Technical Report

Paper • 2507.01949 • Published Jul 2, 2025 • 131
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks

Paper • 2507.11336 • Published Jul 15, 2025 • 7
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers

Paper • 1906.02792 • Published Jun 6, 2019
Rethinking the Evaluation of Video Summaries

Paper • 1903.11328 • Published Mar 27, 2019
Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Paper • 2009.08043 • Published Sep 17, 2020
Video Representation Learning with Visual Tempo Consistency

Paper • 2006.15489 • Published Jun 28, 2020
Video Representation Learning by Recognizing Temporal Transformations

Paper • 2007.10730 • Published Jul 21, 2020
Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Paper • 2112.09583 • Published Dec 17, 2021
Video Swin Transformer

Paper • 2106.13230 • Published Jun 24, 2021
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training

Paper • 2212.14546 • Published Dec 30, 2022
Video-Text Retrieval by Supervised Sparse Multi-Grained Learning

Paper • 2302.09473 • Published Feb 19, 2023 • 1
Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos

Paper • 2303.08345 • Published Mar 15, 2023
Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

Paper • 2303.07224 • Published Mar 13, 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos

Paper • 2303.12060 • Published Mar 21, 2023
Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models

Paper • 2308.09363 • Published Aug 18, 2023
Multi-event Video-Text Retrieval

Paper • 2308.11551 • Published Aug 22, 2023
Video ReCap: Recursive Captioning of Hour-Long Videos

Paper • 2402.13250 • Published Feb 20, 2024 • 26
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 26
Video Editing for Video Retrieval

Paper • 2402.02335 • Published Feb 4, 2024
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering

Paper • 2401.10711 • Published Jan 19, 2024
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

Paper • 2404.12353 • Published Apr 18, 2024
Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion

Paper • 2308.06685 • Published Aug 13, 2023 • 1
Moving Object Based Collision-Free Video Synopsis

Paper • 2401.02419 • Published Sep 17, 2023 • 1
HawkEye: Training Video-Text LLMs for Grounding Text in Videos

Paper • 2403.10228 • Published Mar 15, 2024 • 1
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models

Paper • 2311.16103 • Published Nov 27, 2023 • 1
Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Paper • 2312.00220 • Published Nov 30, 2023 • 1
Conditional Modeling Based Automatic Video Summarization

Paper • 2311.12159 • Published Nov 20, 2023 • 1
Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos

Paper • 2404.17571 • Published Apr 26, 2024 • 1
vid-TLDR: Training Free Token merging for Light-weight Video Transformer

Paper • 2403.13347 • Published Mar 20, 2024 • 1
Boost Video Frame Interpolation via Motion Adaptation

Paper • 2306.13933 • Published Jun 24, 2023 • 3
VcLLM: Video Codecs are Secretly Tensor Codecs

Paper • 2407.00467 • Published Jun 29, 2024 • 2
Video Understanding with Large Language Models: A Survey

Paper • 2312.17432 • Published Dec 29, 2023 • 3
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14, 2024 • 15
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

Paper • 2306.05424 • Published Jun 8, 2023 • 7
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models

Paper • 2311.13435 • Published Nov 22, 2023 • 18
Towards Retrieval Augmented Generation over Large Video Libraries

Paper • 2406.14938 • Published Jun 21, 2024 • 22