nomic-embed-text-v1-unsupervised: A Reproducible Long Context (8192) Text Embedder

nomic-embed-text-v1-unsupervised is 8192 context length text encoder. This is a checkpoint after contrastive pretraining from multi-stage contrastive training of the final model. The purpose of releasing this checkpoint is to open-source training artifacts from our Nomic Embed Text tech report here

If you want to use a model to extract embeddings, we suggest using nomic-embed-text-v1.

Join the Nomic Community

Downloads last month: 369

Model tree for nomic-ai/nomic-embed-text-v1-unsupervised

Quantizations

1 model

Spaces using nomic-ai/nomic-embed-text-v1-unsupervised 19

Collection including nomic-ai/nomic-embed-text-v1-unsupervised

Nomic Embed

Collection

Open Source Long Context Text Embedders • 8 items • Updated Feb 14, 2024 • 24

Paper for nomic-ai/nomic-embed-text-v1-unsupervised

Nomic Embed: Training a Reproducible Long Context Text Embedder

Paper • 2402.01613 • Published Feb 2, 2024 • 17

Evaluation results

accuracy on MTEB AmazonCounterfactualClassification (en)
test set self-reported

76.985
ap on MTEB AmazonCounterfactualClassification (en)
test set self-reported

39.472
f1 on MTEB AmazonCounterfactualClassification (en)
test set self-reported

70.592
accuracy on MTEB AmazonPolarityClassification
test set self-reported

87.540
ap on MTEB AmazonPolarityClassification
test set self-reported

83.161
f1 on MTEB AmazonPolarityClassification
test set self-reported

87.523
accuracy on MTEB AmazonReviewsClassification (en)
test set self-reported

46.808
f1 on MTEB AmazonReviewsClassification (en)
test set self-reported

46.263