It works blazingly fast

by InsecureErasure - opened 4 days ago

I just tested 2602SY_ZImageTurbo-nvfp4.safetensors and I must say the results are impressive: 9-step generation of a 1024x1536 image on a poor man's RTX 5060 with 8 GB VRAM in under 9 seconds is insane! I was using Q5_K_S quants which required at least triple the time. NVFP4 is the way to go for low-end 50xx cards.

I was wondering which quantization method you used. I usually resort to convert_to_quant from silveroxides.

Thanks a lot!

ApacheOne

Owner 4 days ago

I was wondering which quantization method you used. I usually resort to convert_to_quant from silveroxides.

Convert_to_quant is the same as my methods other than for working with models outside comfy, Nvidia's model optimizer is more wide range for other types of models.

I started from a fork of tritant/ComfyUI_Kitchen_nvfp4_Converter, my colab used for 2602SY_ZImageTurbo-nvfp4.safetensors is here Apache0ne/ComfyUI_Kitchen_nvfp4_Converter_colab. (You would need to swap some settings).

I wanted something lightweight so I could change how I quant each layer or support new models quickly. The general loop is, Setup comfy-kitchen or Nvidia's modelopt depending on the model, Download and dump the model's keys, Send the key dump and convert script to an LLM, convert and test.

Just extra information on where things are going if interested:

I have newer scripts that I will get around to posting at some point. The end goal is to collect a vast amount of key dumps from all model types, make all the per layer combos we can make and then have an LLM improve upon it with custom kernels.

QuantFunc I would imagine has a similar workflow going, they seemed to have slightly improved on SVDQ's massive cpu bottleneck. So my goal is a little outdated already as the new lighting runtime methods are smaller, faster, holds a bit more quality and supports the same range of nvidia gpus as nvfp4.
issues are SVDQ still has a cpu bottleneck which I cant work with from lack of cpu time to burn, and the runtime engine for Quantfunc isnt fully public, But nice to have as a further finish line to compare nvfp4 vs SVDQ.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment