FP8 Quantized model of ANIMA

!! Currently FP8, MXFP8 and NVFP4 doesn't work properly with torch.compile, so it is better to use original BF16 model. !!

There are two models - FP8 and NVFP4Mixed.

FP8 (2.4GB) : (recommend) maximize generation speed while preserving quality as much as possible.
NVFP4Mixed (2.0GB): (marginal quality) Mixture of FP8 and NVFP4.

To use torch.compile, use the TorchCompileModelAdvanced node from KJNodes, set the mode to max-autotune-no-cudagraphs, and make sure dynamic is set to false.

Generation speed

Tested on

RTX5090 (400W), ComfyUI with --fast option, torch2.10.0+cu130
Generates 832x1216, 30steps, cfg 4.0, er sde, simple

quant	none	sage+torch.compile
bf16	7.13s/4.21it/s	5.16s/5.81it/s (+38%)
fp8	6.66s/4.50it/s (+11%)	4.52s/6.64it/s (+58%)
nvfp4mix	6.37s/4.71it/s (+12%)	4.99s/6.01it/s (+43%)

Sample

anima-preview3-base

anima-preview2

anima-preview

quant	sample
bf16
fp8
nvfp4mixed

Quantized layers

fp8

{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": ["blocks.0", "blocks.1."] },
    { "policy": "float8_e4m3fn", "match": ["q_proj", "k_proj", "v_proj", "o_proj", "output_proj", ".mlp"] },
    { "policy": "nvfp4", "match": [] }
  ]
}

nvfp4mixed

{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": ["blocks.0."] },
    { "policy": "float8_e4m3fn", "match": ["v_proj", "adaln_modulation", ".mlp"] },
    { "policy": "nvfp4", "match": ["k_proj", "q_proj", "output_proj"] }
  ]
}

Downloads last month: 1,844

Model tree for Bedovyy/Anima-FP8

Base model

circlestone-labs/Anima

Quantized

(19)

this model

Adapters

1 model