Instructions to use alexgoldberg/hebrew-manuscript-joint-ner-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use alexgoldberg/hebrew-manuscript-joint-ner-v2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="alexgoldberg/hebrew-manuscript-joint-ner-v2")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("alexgoldberg/hebrew-manuscript-joint-ner-v2") model = AutoModelForTokenClassification.from_pretrained("alexgoldberg/hebrew-manuscript-joint-ner-v2") - Notebooks
- Google Colab
- Kaggle
Hebrew Manuscript Joint NER v2
This repository contains the MHM Pipeline person NER model. The current checkpoint is the role-aware v3 replacement for the earlier custom two-head checkpoint, while keeping the same repository and bundle name for compatibility.
The model is a DictaBERT token-classification checkpoint that predicts BIO labels with the person role encoded directly in the tag:
AUTHORTRANSCRIBEROWNERCENSORTRANSLATORCOMMENTATOR
Evaluation
Held-out v3 test split, 904 items:
| Metric | Score |
|---|---|
| strict span + role F1 | 0.8031 |
| strict precision | 0.7888 |
| strict recall | 0.8180 |
| name-only F1 | 0.8665 |
| role accuracy when name matched | 0.9269 |
Per-role strict span+role F1:
| Role | F1 |
|---|---|
| AUTHOR | 0.8678 |
| CENSOR | 0.8830 |
| COMMENTATOR | 0.5185 |
| OWNER | 0.7330 |
| TRANSCRIBER | 0.8112 |
| TRANSLATOR | 0.9072 |
Usage
from transformers import AutoModelForTokenClassification, AutoTokenizer
repo_id = "alexgoldberg/hebrew-manuscript-joint-ner-v2"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForTokenClassification.from_pretrained(repo_id)
In MHM Pipeline, use ner.inference_pipeline.JointNERPipeline. It preserves the
legacy output schema:
from ner.inference_pipeline import JointNERPipeline
pipeline = JointNERPipeline("alexgoldberg/hebrew-manuscript-joint-ner-v2")
entities = pipeline.process_text("ืืกืคืจ ื ืืชื ืขื ืืื ืืฉื ืื ืืขืงื.")
Example output:
[
{
"person": "ืืฉื ืื ืืขืงื",
"role": "TRANSCRIBER",
"confidence": 0.9918,
"model_confidence": 0.9918,
"start": 17,
"end": 28
}
]
Notes
The previous custom checkpoint can be recovered from the Hub commit history. This version intentionally replaces keyword-based role classification with neural role-aware BIO labels.
- Downloads last month
- 42
Model tree for alexgoldberg/hebrew-manuscript-joint-ner-v2
Base model
dicta-il/dictabert