Papers
arxiv:2506.00979

IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection

Published on Jun 1, 2025
Β· Submitted by
Changjiang Jiang
on Jun 3, 2025
Authors:
,
,
,
,
,
,
,
,

Abstract

AIGC detection faces challenges with limited multidimensional explainable datasets and insufficient interpretability in existing methods, which are addressed by introducing Ivy-Fake, a large-scale multimodal benchmark, and Ivy-xDetector, a reinforcement learning model based on GRPO for explainable reasoning and improved detection performance.

AI-generated summary

The rapid development of Artificial Intelligence Generated Content (AIGC) techniques has enabled the creation of high-quality synthetic content, but it also raises significant security concerns. Current detection methods face two major limitations: (1) the lack of multidimensional explainable datasets for generated images and videos. Existing open-source datasets (e.g., WildFake, GenVideo) rely on oversimplified binary annotations, which restrict the explainability and trustworthiness of trained detectors. (2) Prior MLLM-based forgery detectors (e.g., FakeVLM) exhibit insufficiently fine-grained interpretability in their step-by-step reasoning, which hinders reliable localization and explanation. To address these challenges, we introduce Ivy-Fake, the first large-scale multimodal benchmark for explainable AIGC detection. It consists of over 106K richly annotated training samples (images and videos) and 5,000 manually verified evaluation examples, sourced from multiple generative models and real world datasets through a carefully designed pipeline to ensure both diversity and quality. Furthermore, we propose Ivy-xDetector, a reinforcement learning model based on Group Relative Policy Optimization (GRPO), capable of producing explainable reasoning chains and achieving robust performance across multiple synthetic content detection benchmarks. Extensive experiments demonstrate the superiority of our dataset and confirm the effectiveness of our approach. Notably, our method improves performance on GenImage from 86.88% to 96.32%, surpassing prior state-of-the-art methods by a clear margin.

Community

Paper author Paper submitter
β€’
edited Jun 4, 2025

Project page: https://pi3ai.github.io/IvyFake

πŸš€ This paper introduces IVY-FAKE, a groundbreaking framework to tackle the rapidly growing challenge of detecting sophisticated AI-generated images and videos. Current detection methods often act as black boxes and struggle to handle both images and videos seamlessly. IVY-FAKE offers a unified and explainable benchmark!

πŸ˜† Takeaways:

  1. Unified Multimodal Dataset (IVY-FAKE): The first large-scale benchmark designed for explainable AIGC detection across both images and videos. It boasts over 150,000 richly annotated training samples and 18,700 evaluation examples, going beyond simple "real/fake" labels to include detailed natural-language reasoning. This addresses the fragmented modality coverage and sparse annotations of previous datasets.
  2. Explainable Detector (IVY-XDETECTOR): A novel vision-language architecture that performs joint detection and explanation for both image and video content. Unlike models that output only coordinates or heatmaps, IVY-XDETECTOR provides human-readable natural-language descriptions of visual artifacts.
  3. Addressing "Black Box" Limitations: Many existing AIGC detectors are binary classifiers with limited interpretability, hindering transparency and trust. IVY-FAKE and IVY-XDETECTOR are designed to overcome this.
  4. Rich Annotations and Progressive Training: The dataset includes detailed reasoning, enabling a more nuanced evaluation of models' interpretability and explanatory capabilities. Annotations were generated using Gemini 2.5 Pro with a structured approach to articulate reasoning before conclusions; IVY-XDETECTOR uses a three-stage training pipeline: 1) General video understanding, 2) AIGC detection fine-tuning for binary classification, and 3) Joint optimization for detection and explainability.

This work provides a significant step towards more transparent and trustworthy AI content analysis, offering a robust foundation for future research in multimodal AIGC detection.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Amazing!!

Paper author Paper submitter

πŸ“Š Ivy-Fake Dataset Evaluation Script

Official evaluation script for the Ivy-Fake dataset. Welcome to run the evaluation result.


πŸ”— Quick Links

  • πŸ“‚ Script Repository: github.com/Pi3AI/Ivy-Fake
  • πŸ“ Documentation: See repository's README
  • πŸ› Issues/Bugs?: Open a GitHub Issue

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.00979 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.00979 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.00979 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.