Best way to generate embeddings for structured product attributes in B2B ecommerce search

sumit-raj-710 · February 3, 2026, 5:35pm

I am building a B2B product search system using vector embeddings and would like advice specifically on how to generate embeddings for structured product attributes.

Context

Domain: B2B ecommerce
Queries: Short keyword-style searches (4 to 5 tokens), often containing numbers, units, and alphanumeric attributes
Examples:
- “12 kva diesel generator”
- “5 hp air compressor”
- “cnc milling machine 3 axis”

Search architecture

Initial candidate retrieval using product title embeddings
Reranking using product attribute embeddings

Product data

Each product has a title and a set of structured attributes stored as key-value pairs.

Example:

Product: Diesel Generator

Attributes:

“power_rating: 12 kva”
“fuel_type: diesel”
“phase: 3”
“cooling_type: air cooled”
“application: industrial backup”

Main question

What is the best way to preprocess and embed these attributes for semantic reranking?

Attribute embedding strategies we are considering

Flat concatenation

power rating 12 kva fuel type diesel phase 3 cooling type air cooled application industrial backup

Key-value with separators

power_rating: 12 kva | fuel_type: diesel | phase: 3 | cooling_type: air cooled | application: industrial backup

Line-separated attributes

power_rating: 12 kva
fuel_type: diesel
phase: 3
cooling_type: air cooled
application: industrial backup

Natural language passage

This diesel generator has a power rating of 12 kva, uses diesel fuel, supports 3 phase operation, and is air cooled for industrial backup usage.

Per-attribute embeddings
- Generate one embedding per attribute and aggregate scores during reranking
Any other recommended method?

Specific questions

Should attributes be embedded as a single combined text or as individual attribute embeddings
Does explicitly preserving attribute keys help embedding quality
Are separator tokens or structured formatting important for short, attribute-heavy queries
Any best practices for handling numeric values, units, and alphanumeric attributes
Whether passage-style text performs better than structured key-value text for dense retrieval

Model considerations

Currently considering Marqo ecommerce embedding (large)
Open to recommendations for other models that work well for:
- Short B2B queries
- Numeric and unit-heavy matching
- Attribute-based reranking

John6666 · February 4, 2026, 10:01am

It seems there is some known know-how.

What matters most in your case

Your queries are short (4–5 tokens) but dense with constraints:

Numbers + units: 12 kVA, 5 hp
Counts: 3 axis, 3 phase
Alphanumerics: model/SKU/standards (common in B2B)

Dense embeddings are excellent for semantic proximity (“generator” ↔ “genset”, “industrial backup” ↔ “standby”), but they are often unreliable at enforcing numeric precision and exact spec constraints in a cosine-similarity-only setup. Empirical work on embeddings and numeric detail shows this is a consistent failure mode. (Hugging Face)

For that reason, the best-performing B2B search stacks treat embeddings as one signal in reranking, not the judge of truth for numbers/units/codes.

Target behavior: “spec correctness” first, semantics second

For B2B, success is typically:

Retrieve the right product type (generator, compressor, milling machine)
Ensure the top results satisfy the hard constraints (kVA/HP/axis/phase)
Use semantics to break ties (application, cooling type, brand preference, etc.)

This informs how you should preprocess attributes and how you should score candidates.

Step 1 — Normalize the catalog attributes before embedding

Attribute embeddings become dramatically more stable once the underlying attributes are consistent.

1) Canonicalize attribute keys (schema consolidation)

Make a canonical key set and map synonyms into it:

power, power_rating, rated_power → power_rating
phases, phase_count → phase
cooling, cooling_type → cooling_type

Store:

key_raw
key_canonical

This reduces fragmentation in both lexical and semantic matching.

2) Normalize numeric values + units (raw + canonical)

For each numeric attribute, store both representations:

value_raw_text: "12 kva"
value_number: 12
unit_raw: "kva"
unit_canonical: "kVA"
value_canonical_base: 12000
unit_base: "VA"

This enables:

exact/range matching
unit conversion (kVA ↔ VA, HP ↔ W, inch ↔ mm)
controlled tolerances (±5%, bucket ranges)

3) Normalize categorical attributes into controlled vocab

Example:

fuel_type: diesel / gasoline / natural_gas / LPG
cooling_type: air_cooled / water_cooled / oil_cooled

Store:

value_raw
value_canonical

4) Normalize codes (alphanumerics) into multiple searchable forms

For MPNs, SKUs, standards, thread sizes, etc.:

AB-1234 → AB1234, AB 1234
M12x1.75 → M12 x 1.75, M12x1.75

These fields usually need strong lexical treatment (exact/partial/regex/character n-grams). Dense embeddings should not be your only tool here.

Step 2 — Build the attribute text that you embed (the best “view” for spec queries)

You listed 4 serialization options. For your query style, the best default is:

Recommended default: Line-separated `key: value` (option #3)

Example “spec view” for the generator:

product_type: diesel generator
power_rating: 12 kVA
power_rating_va: 12000 VA
fuel_type: diesel
phase: 3
cooling_type: air cooled
application: industrial backup

Why it works well:

Keys disambiguate values (especially numbers like 3).
Newlines preserve boundaries cleanly (avoids attribute “bleeding” that happens with flat concatenation).
Adding canonical numeric variants (power_rating_va) gives stable anchors.

Add controlled redundancy for query variance

Users type messy variants: 12kva, 12 kva, 12 KVA, 3-phase, 3 phase.

Add a small number of variants (don’t spam):

power_rating: 12 kVA (raw: 12 kva)
phase: 3 (aka: 3-phase, three phase)

Keep it consistent across products.

Step 3 — Don’t embed “attributes” as only one blob; use two views

A single attribute blob forces one vector to represent both “hard specs” and “soft semantics.” In B2B, those behave differently.

View A: Spec view (structured, compact)

The line-separated key: value text above
Goal: capture spec tokens and field context

View B: Intent view (short, template-like natural language)

Not a long paragraph. Keep it short:

Industrial standby diesel generator for backup power. 12 kVA, 3-phase, air-cooled.

Goal: improve synonyms and intent matching without drowning out spec tokens.

Why not only passage-style (#4)?

Passage-style can help “application/intent,” but it often introduces filler and reduces the density of spec tokens. In your query regime, that can hurt.

The balanced approach is:

spec view for constraints
intent view for semantics

Step 4 — Combined vs per-attribute embeddings (what I would do)

Default: embed combined views, not every attribute separately

spec_vec from the spec view
intent_vec from the intent view
(and title_vec for retrieval)

This keeps infra simple and gives strong signal.

Selectively add per-attribute or per-group embeddings (only where it helps)

Per-attribute embeddings make sense when:

a field is long/semantic (application, compatible_materials, standards_notes)
you want explicit weighting by field

They are usually not worth it for:

small numeric fields (phase, axis_count, power_rating) because you should score those deterministically

Store multiple vectors per product if your DB supports it

Many vector DBs support multiple vectors per object (e.g., “named vectors”). Qdrant documents storing multiple named vector spaces per point. (qdrant.tech)
Milvus provides multi-vector hybrid search examples and then reranking strategies to merge results. (milvus.io)

Practical implication:

store title_vec, spec_vec, intent_vec
score them separately and combine in reranking

Step 5 — Candidate retrieval should be hybrid (dense + lexical), then fused

Even if your initial plan is “title embeddings for candidates,” B2B search benefits heavily from hybrid retrieval:

Dense vectors: semantic category matching
Lexical/BM25: units, codes, exact tokens (kVA, hp, M12x1.75)

Weaviate explains hybrid search as running keyword (BM25) and vector search in parallel and then fusing results with algorithms like Reciprocal Rank Fusion (RRF). (Weaviate)

Why it matters for you:

A query like 12 kva diesel generator has “hard anchors” (12, kva, diesel).
If dense retrieval alone underweights any anchor, you can miss the best candidates entirely.
Hybrid protects recall.

Step 6 — Attribute reranking: treat numbers/units/codes as explicit features

Reranking is where your “attribute embeddings” should pay off, but not as a single cosine score.

Reranking signals I would compute for each candidate (top-K)

A) Deterministic constraint features (high weight)

From query parsing + catalog normalization:

Numeric match score:
- exact match (after unit conversion)
- within tolerance (e.g., ±5% or a domain-specific margin)
- bucket match (10–15 kVA)
Unit compatibility:
- same unit / convertible / mismatch
Categorical matches:
- fuel_type, phase, axis_count, voltage class, etc.
Code match:
- exact/prefix/normalized match for MPN/SKU/standard codes

These features often dominate business relevance in B2B.

B) Embedding similarity features (medium weight)

sim(query, spec_vec)
sim(query, intent_vec)
optionally sim(query, title_vec)

Treat these as soft evidence.

C) Cross-encoder reranker score (often the biggest lift)

A cross-encoder reranker reads query + candidate together and outputs a relevance score directly.

Cohere’s reranking docs explicitly note support for semi-structured data (JSON) and the ability to set “rank fields” so the model focuses on specific fields. (docs.cohere.com)
The open bge-reranker-v2-m3 model card describes reranking as directly scoring (query, document) rather than embedding both separately. (Hugging Face)

Why this helps with your attributes:

The model sees phase: 3 and the query token 3 phase in the same context.
It can learn that 3 is phase here, not axis count.

A concrete scoring shape

For each candidate document (d) and query (q):

score =
- w_rerank * reranker(q, d)
- + w_num * numeric_match(q, d)
- + w_cat * categorical_match(q, d)
- + w_code * code_match(q, d)
- + w_vec * (sim_spec + sim_intent)
- + w_lex * bm25(q, d) (optional in reranking)

Start with hand-tuned weights, then learn them (LTR) once you have clicks/orders.

Step 7 — Answering your specific questions directly

1) Single combined text vs individual attribute embeddings

Recommended default

Single spec view embedding + single intent view embedding

Selective per-attribute embeddings

Only for long semantic fields where it improves recall/precision and where field weighting matters.

2) Does preserving keys help?

Yes, especially for numbers and ambiguous short values. Keys create context that disambiguates 3 and 12. Field-aware reranking approaches also assume multi-field structure. (docs.cohere.com)

3) Are separators/formatting important?

Yes, but simple is best:

key: value with newlines is robust and debuggable.
| separators are fine; they usually don’t outperform newlines if keys are present.

4) Best practices for numeric values/units/alphanumerics

Parse and normalize into canonical numeric forms (for matching/filtering)
Keep raw strings alongside canonical forms (for audit + lexical anchoring)
Add a small set of common aliases/variants (not too many)
Treat codes as lexical-first (exact/prefix/n-gram), and use embeddings as secondary

5) Passage-style vs structured key-value

Passage-style helps soft semantics (application, intent)
Structured KV helps constraint grounding
Use both views; don’t force one representation to do both jobs.

Step 8 — Model choice: what I would evaluate for your pipeline

You mentioned Marqo ecommerce embedding (large); it is explicitly positioned as an ecommerce embedding model on Hugging Face. (Hugging Face)

For your case, I would evaluate a small, controlled shortlist:

Embeddings

Marqo/marqo-ecommerce-embeddings-L (commerce-tuned baseline) (Hugging Face)
BAAI/bge-m3 (popular general retrieval baseline; good for long text and multi-granularity setups) (Hugging Face)
Qwen/Qwen3-Embedding-4B (embedding + ranking family; useful if you want paired embed + rerank within one ecosystem) (Hugging Face)

Reranking

BAAI/bge-reranker-v2-m3 (open reranker; query+doc → score) (Hugging Face)
If using Cohere rerank, exploit rank fields to prioritize specific keys/fields in your semi-structured document. (docs.cohere.com)

The model choice should ultimately be driven by your own “spec-heavy” evaluation set (next section).

Step 9 — Evaluation that matches B2B reality (what I would measure)

Generic IR metrics can hide spec failures. You want at least one metric that measures constraint satisfaction.

Build an internal benchmark (must-have)

Create a labeled set stratified by query type:

numeric + unit (12 kVA, 5 hp, 200 psi)
count constraints (3 axis, 3 phase, 2 pole)
codes (AB-1234, M12x1.75)
pure semantic queries (no numbers)

Track:

Recall@K for candidate retrieval
nDCG@10 / MRR for reranking quality
Constraint satisfaction rate: top-1 satisfies extracted constraints (your business KPI)

Do ablations (so you know what helped)

Run these variants:

title_vec only
title_vec + spec_vec
title_vec + spec_vec + intent_vec
add deterministic numeric/categorical/code features
add reranker (cross-encoder)

This isolates whether embeddings are helping, and where.

Step 10 — Common pitfalls and how to avoid them

1) Over-reliance on cosine similarity for specs

This is the biggest cause of “looks relevant but wrong kVA/HP/axis” results. Use deterministic features for constraints and rerankers for context.

2) Too many attributes in the embedded text

Dumping 150 attributes reduces signal density. Prefer:

a fixed “high-signal” attribute set per category
plus a few category-specific keys

3) Multi-vector infra friction

Many frameworks assume one vector per record. A recent LlamaIndex issue shows practical friction when trying to use multiple dense vector fields in Milvus-backed stores, with workarounds like pre-creating schemas. (GitHub)
Plan for this early: choose a store/framework path that supports multi-vector cleanly or isolate it in your application layer.

4) Losing raw tokens during normalization

If you normalize away user-typed variants, you can hurt lexical/hybrid matching. Keep raw forms.

5) Logging and debuggability gaps

For every query, log:

parsed constraints
matched constraints per result
spec view text used
intent view text used
per-signal scores (numeric, lexical, vector, reranker)

This turns relevance tuning into an engineering loop.

A practical “do this first” implementation plan

Normalize attributes (keys, numeric units, categorical vocab, code variants).
Create two texts per product:
- spec view: newline key: value + canonical numeric fields
- intent view: short template summary
Embed title, spec, intent. Store as separate vectors if supported (named vectors / multi-vector). (qdrant.tech)
Candidate retrieval: hybrid (BM25 + dense) fused via RRF. (Weaviate)
Rerank top-K using:
- deterministic constraint features (numeric/unit/categorical/code)
- embedding similarities
- a cross-encoder reranker (field-aware if possible) (docs.cohere.com)
Evaluate with a spec-heavy benchmark and iterate via ablations.

Recommendation on your 6 strategies (final)

If you want a single answer:

Use (3) line-separated key/value as the primary attribute embedding input (spec view).
Add a second short intent view (controlled natural language) for semantics.
Use (5) per-attribute embeddings only for a small set of semantic fields if needed.
Do not rely on (4) long passages alone for spec-heavy queries.
Back embeddings with deterministic numeric/unit/code matching and (ideally) a cross-encoder reranker for final ordering. (docs.cohere.com)

Topic		Replies	Views
What is the best text embedding model for ecommerce product search (short, noisy user queries)? Models	2	132	December 5, 2025
Choosing a model and embedding input structure for searching e-commerce products Models	0	1747	October 18, 2023
Similarity search based on multiple text attributes Intermediate	0	432	December 4, 2023
Searching for exact keyword using sbert models Models	1	843	July 26, 2024
Embedding which takes in account order of words 🤗Transformers	0	78	May 29, 2024