Instructions to use google/siglip-so400m-patch14-384 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/siglip-so400m-patch14-384 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-image-classification", model="google/siglip-so400m-patch14-384") pipe( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png", candidate_labels=["animals", "humans", "landscape"], )# Load model directly from transformers import AutoProcessor, AutoModelForZeroShotImageClassification processor = AutoProcessor.from_pretrained("google/siglip-so400m-patch14-384") model = AutoModelForZeroShotImageClassification.from_pretrained("google/siglip-so400m-patch14-384") - Notebooks
- Google Colab
- Kaggle
Model performance is not adequate.
When using zero-shot models for image classification, it's important to use adequate prompts. Among labels that you provided, the model output the highest probability for the 'cat' label which is correct. The probability itself is low because the picture can't be described as 'cat'. If you use prompt such as 'a picture of a cat printed on a box' - the probability will be much higher.
Indeed, see also the discussion here: https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384/discussions/3
I answered over there, this is expected and not a problem, you need to either softmax the output, or calibrate to your data/task depending on what exactly you want to do. Doing so is pretty easy: https://huggingface.co/timm/ViT-SO400M-14-SigLIP-384/discussions/3#65f964b748d4f7baa4f1858d
