DFF: InstructBLIP-based Explainable DeepFake Detection

πŸ“– Model Description

This is the core DFF (DeepFake Detection and Forensic Explanation Framework) model as described in the ACL 2026 paper: "Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline".

DFF is built upon the InstructBLIP (Flan-T5 XL) architecture. By integrating the Face-ViT auxiliary classifier, it achieves state-of-the-art performance in both forgery localization (mask generation) and forensic explanation (captioning).

🌟 Key Capabilities

  1. Forgery Localization: Generates high-resolution binary masks highlighting manipulated facial regions.
  2. Natural Language Explanation: Produces detailed text describing why a specific image is considered a forgery (e.g., "The texture around the eyes is unnatural due to GAN-based blending").

πŸ› οΈ Model Details

  • Base LLM: Flan-T5 XL.
  • Visual Encoder: EVA-ViT-G.
  • Auxiliary Module: Face-ViT (Multi-label perception).
  • Task: Explainable Detection & Multi-modal Attribution Reporting.

πŸš€ Links

πŸ“œ Citation

@inproceedings{lian2026generating,
  title={Generating Attribution Reports for Manipulated Facial Images: A Dataset and Baseline},
  author={Lian, Jingchun and others},
  booktitle={Proceedings of ACL},
  year={2026},
  note={To appear}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support