Instructions to use HuggingFaceM4/idefics2-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HuggingFaceM4/idefics2-8b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="HuggingFaceM4/idefics2-8b")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics2-8b") model = AutoModelForImageTextToText.from_pretrained("HuggingFaceM4/idefics2-8b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use HuggingFaceM4/idefics2-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HuggingFaceM4/idefics2-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics2-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/HuggingFaceM4/idefics2-8b
- SGLang
How to use HuggingFaceM4/idefics2-8b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HuggingFaceM4/idefics2-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics2-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HuggingFaceM4/idefics2-8b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HuggingFaceM4/idefics2-8b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use HuggingFaceM4/idefics2-8b with Docker Model Runner:
docker model run hf.co/HuggingFaceM4/idefics2-8b
Idefics2-pretraining
Hi,
There does not seem to be any support for pre-training.
When I try, there seems to be some instability with the Connector. How did you initialize your weights?
Hi @orrzohar
can you say more about the instability you are seeing?
our initialization scheme for newly initialized parameters is rather standard. the code snippet below should give you a good idea:
if isinstance(module, MLP):
for sub_module_name, sub_module in module.named_modules():
if isinstance(sub_module, nn.Linear):
factor = 1.0
if "down_proj" in sub_module_name:
factor = 2.0
init_a_linear(sub_module, std=(0.4 / (self.config.hidden_size * factor)) ** 0.5)
Hi Victor,
Thank you for your response!
What I am seeing is that the loss initially decreases, but then NaN's are detected after the "connector" (MLP+Perceiver Pooler). I have tried xavier_uniform_/kaiming_uniform_ for all the connector whieghts -- but was unsuccessful.
I have tried the obvious -- varying batch sizes/learning rates (2-1000 and 1e-3-1e-6).
It is extremely regular -- seems to happen at the same iteration for the same batch size, no matter the learning rate. The only time this does not occur is when using batch size=1.
Have you ever experienced similar/how did you debug?
Best,
Orr
indeed nan are never a good sign....
before I answer, a few question:
- are you fine-tuning or training from scratch?
- what data?
- mixed precision? what precision?
- is it specifically after the connector? any details as to where in the connector?
Hi @VictorSanh ,
- I am training from scratch
- LLaVA 1.5
- BF16
- It is usually in the MLP of the Idefics2PerceiverLayer, usually after "gate_proj", very rarely after "down_proj".
I tried your initialization code, increasing the batch size to 4096 and reducing lr to 1r-06, but with no luck. When interrogating the issue further, I noticed that the 'latents' remain all-ones even when training persists to a few 100 iterations. I am sure that the parameters are added to the optimizer. I tried randomly initializing those instead, but that did not solve the issue.
Best,
Orr