404 Not Found / Unexpected Token 'N' on Serverless Inference Router (2026 Migration)

Hi there,

I am currently migrating my workflow from the legacy api-inference.huggingface.co to the new router.huggingface.co for image embedding tasks using the openai/clip-vit-base-patch32 and sentence-transformers/all-MiniLM-L6-v2 models.

Despite following the latest documentation and community discussions regarding the 2026 serverless inference migration, I am consistently receiving a 404 Not Found response (plain text, which causes an ‘Unexpected token N’ error when parsed as JSON) from the router.

I have already verified that my HF_TOKEN is valid and active. I have tested the following endpoint structures via a Supabase Edge Function:

  1. https://huggingface.co/proxy/router.huggingface.co/v1/embeddings (OpenAI-compatible)

  2. https://huggingface.co/proxy/router.huggingface.co/inference/v1/feature-extraction/[MODEL_ID]

  3. https://huggingface.co/proxy/router.huggingface.co/hf-inference/models/[MODEL_ID]

  4. https://huggingface.co/proxy/router.huggingface.co/pipeline/feature-extraction/[MODEL_ID]

Could you please provide the exact, currently supported URL structure for serverless feature-extraction tasks on the new router? Is there a known propagation issue or a specific task-prefix required for these models during the current rollout?

1 Like

Seems no CLIP models are deployed now…


What changed in the 2026 router setup

Hugging Face now exposes two distinct “surfaces” under router.huggingface.co, and mixing them is the most common cause of “404 Not Found → Unexpected token N (Not Found)”:

  1. OpenAI-compatible /v1/*

    • Chat-only today (by Hugging Face’s own note).
    • If you call /v1/embeddings, you should expect 404 in many cases because embeddings are not offered via the OpenAI-compatible surface. (Hugging Face)
  2. Task/pipeline routes (Serverless / HF Inference provider)

    • This is where “feature extraction = embeddings” lives. Hugging Face’s Inference Providers docs define Feature Extraction specifically as “convert text into a vector (embedding)”. (Hugging Face)
    • For the legacy serverless provider (“HF Inference”), the working pattern is the pipeline URL shown below.

The currently supported URL structure for serverless text embeddings (feature-extraction)

:white_check_mark: Canonical router URL (HF Inference provider)

POST https://huggingface.co/proxy/router.huggingface.co/hf-inference/models/{MODEL_ID}/pipeline/feature-extraction

This is explicitly confirmed (with a concrete curl example) for sentence-transformers/all-MiniLM-L6-v2 in the model’s pinned update. (Hugging Face)

Request format (same shape as legacy)

  • Header: Authorization: Bearer <HF_TOKEN>
  • Header: Content-Type: application/json
  • Body: {"inputs": ["text1", "text2"]} (or a single string)

That matches what you already got working (200 + vectors).


Why /v1/models doesn’t list all-MiniLM-L6-v2 (but inference still works)

GET https://huggingface.co/proxy/router.huggingface.co/v1/models is part of the OpenAI-compatible chat surface; the docs present it in the “OpenAI-compatible chat completions endpoint” section and also state that this OpenAI-compatible endpoint is chat tasks only. (Hugging Face)

For HF Inference models, use the Hub listing / Hub API instead:

  • Hub API supports listing models by provider:

    • https://huggingface.co/api/models?inference_provider=hf-inference (Hugging Face)
  • The Hub UI shows sentence-transformers/all-MiniLM-L6-v2 under the hf-inference filter. (Hugging Face)

So: not in /v1/models is expected, and does not contradict successful calls to the pipeline route.


Why CLIP returns 404 on the router (your exact symptom)

Your CLIP probes all return:

  • 404
  • Content-Type: text/plain
  • Body: Not Found

That is consistent with the model not being deployed by any Inference Provider. The model page for openai/clip-vit-base-patch32 explicitly says:

“This model isn’t deployed by any Inference Provider.” (Hugging Face)

If no provider serves the model, the router cannot route it → 404 is expected (not a parsing issue; the parsing error is just your code trying to JSON-decode a plain-text 404 body).

Is it “propagation” or a missing task-prefix?

For this specific CLIP model: neither.

  • It’s not a rollout propagation delay; it’s simply not available on serverless providers right now. (Hugging Face)
  • Adding task prefixes like image-feature-extraction won’t help if the model isn’t served anywhere.

How to programmatically distinguish “wrong URL” vs “model not served” vs “warming”

Use the Hub API’s provider metadata:

  • Hub API supports querying provider availability via inferenceProviderMapping and status via inference (warm / undefined). (Hugging Face)

That gives you a clean decision tree:

  • No mapping at all → router will 404 (no provider serves it)
  • Mapping exists but status is staging / not live → may fail or be inconsistent
  • Mapping exists and model is cold/loading → you may see “loading”/5xx and should retry (not 404)

What to do for image embeddings if you need CLIP specifically

Because openai/clip-vit-base-patch32 is not served serverlessly (today), your practical options are:

  1. Run it yourself:

    • Dedicated Inference Endpoint (GPU) or your own infra.
  2. Pick an alternative model that is served by a provider:

    • Check the model page for “Inference Providers” availability or query inferenceProviderMapping as described above. (Hugging Face)

(Feature-extraction on HF Inference is oriented to text embeddings; it’s not a guarantee that “image embedding via CLIP” is exposed as a serverless pipeline.)


Key takeaways for your migration

  • Use pipeline route for text embeddings:

    • POST /hf-inference/models/{MODEL}/pipeline/feature-extraction (Hugging Face)
  • Do not rely on /v1/embeddings:

    • The OpenAI-compatible router surface is chat-only. (Hugging Face)
  • Your CLIP 404s are expected:

    • That model is not deployed by any Inference Provider, so the router cannot serve it. (Hugging Face)
  • If you need image embeddings serverlessly:

    • choose a model that is served (check inferenceProviderMapping) or host CLIP yourself. (Hugging Face)