Seems no CLIP models are deployed now…
What changed in the 2026 router setup
Hugging Face now exposes two distinct “surfaces” under router.huggingface.co, and mixing them is the most common cause of “404 Not Found → Unexpected token N (Not Found)”:
-
OpenAI-compatible /v1/*
- Chat-only today (by Hugging Face’s own note).
- If you call
/v1/embeddings, you should expect 404 in many cases because embeddings are not offered via the OpenAI-compatible surface. (Hugging Face)
-
Task/pipeline routes (Serverless / HF Inference provider)
- This is where “feature extraction = embeddings” lives. Hugging Face’s Inference Providers docs define Feature Extraction specifically as “convert text into a vector (embedding)”. (Hugging Face)
- For the legacy serverless provider (“HF Inference”), the working pattern is the pipeline URL shown below.
The currently supported URL structure for serverless text embeddings (feature-extraction)
Canonical router URL (HF Inference provider)
POST https://huggingface.co/proxy/router.huggingface.co/hf-inference/models/{MODEL_ID}/pipeline/feature-extraction
This is explicitly confirmed (with a concrete curl example) for sentence-transformers/all-MiniLM-L6-v2 in the model’s pinned update. (Hugging Face)
Request format (same shape as legacy)
- Header:
Authorization: Bearer <HF_TOKEN>
- Header:
Content-Type: application/json
- Body:
{"inputs": ["text1", "text2"]} (or a single string)
That matches what you already got working (200 + vectors).
Why /v1/models doesn’t list all-MiniLM-L6-v2 (but inference still works)
GET https://huggingface.co/proxy/router.huggingface.co/v1/models is part of the OpenAI-compatible chat surface; the docs present it in the “OpenAI-compatible chat completions endpoint” section and also state that this OpenAI-compatible endpoint is chat tasks only. (Hugging Face)
For HF Inference models, use the Hub listing / Hub API instead:
So: not in /v1/models is expected, and does not contradict successful calls to the pipeline route.
Why CLIP returns 404 on the router (your exact symptom)
Your CLIP probes all return:
404
Content-Type: text/plain
- Body:
Not Found
That is consistent with the model not being deployed by any Inference Provider. The model page for openai/clip-vit-base-patch32 explicitly says:
“This model isn’t deployed by any Inference Provider.” (Hugging Face)
If no provider serves the model, the router cannot route it → 404 is expected (not a parsing issue; the parsing error is just your code trying to JSON-decode a plain-text 404 body).
Is it “propagation” or a missing task-prefix?
For this specific CLIP model: neither.
- It’s not a rollout propagation delay; it’s simply not available on serverless providers right now. (Hugging Face)
- Adding task prefixes like
image-feature-extraction won’t help if the model isn’t served anywhere.
How to programmatically distinguish “wrong URL” vs “model not served” vs “warming”
Use the Hub API’s provider metadata:
- Hub API supports querying provider availability via
inferenceProviderMapping and status via inference (warm / undefined). (Hugging Face)
That gives you a clean decision tree:
- No mapping at all → router will 404 (no provider serves it)
- Mapping exists but status is
staging / not live → may fail or be inconsistent
- Mapping exists and model is cold/loading → you may see “loading”/5xx and should retry (not 404)
What to do for image embeddings if you need CLIP specifically
Because openai/clip-vit-base-patch32 is not served serverlessly (today), your practical options are:
-
Run it yourself:
- Dedicated Inference Endpoint (GPU) or your own infra.
-
Pick an alternative model that is served by a provider:
- Check the model page for “Inference Providers” availability or query
inferenceProviderMapping as described above. (Hugging Face)
(Feature-extraction on HF Inference is oriented to text embeddings; it’s not a guarantee that “image embedding via CLIP” is exposed as a serverless pipeline.)
Key takeaways for your migration
-
Use pipeline route for text embeddings:
POST /hf-inference/models/{MODEL}/pipeline/feature-extraction (Hugging Face)
-
Do not rely on /v1/embeddings:
- The OpenAI-compatible router surface is chat-only. (Hugging Face)
-
Your CLIP 404s are expected:
- That model is not deployed by any Inference Provider, so the router cannot serve it. (Hugging Face)
-
If you need image embeddings serverlessly:
- choose a model that is served (check
inferenceProviderMapping) or host CLIP yourself. (Hugging Face)