legacy-datasets/wikipedia
Updated • 122k • 632
How to use rmihaylov/bert-base-theseus-bg with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("fill-mask", model="rmihaylov/bert-base-theseus-bg") # Load model directly
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("rmihaylov/bert-base-theseus-bg")
model = AutoModelForMaskedLM.from_pretrained("rmihaylov/bert-base-theseus-bg")Pretrained model on Bulgarian language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is cased: it does make a difference between bulgarian and Bulgarian. The training data is Bulgarian text from OSCAR, Chitanka and Wikipedia.
The model was compressed via progressive module replacing.
Here is how to use this model in PyTorch:
>>> from transformers import pipeline
>>>
>>> model = pipeline(
>>> 'fill-mask',
>>> model='rmihaylov/bert-base-theseus-bg',
>>> tokenizer='rmihaylov/bert-base-theseus-bg',
>>> device=0,
>>> revision=None)
>>> output = model("София е [MASK] на България.")
>>> print(output)
[{'score': 0.1586454212665558,
'sequence': 'София е столица на България.',
'token': 76074,
'token_str': 'столица'},
{'score': 0.12992817163467407,
'sequence': 'София е столица на България.',
'token': 2659,
'token_str': 'столица'},
{'score': 0.06064048036932945,
'sequence': 'София е Перлата на България.',
'token': 102146,
'token_str': 'Перлата'},
{'score': 0.034687548875808716,
'sequence': 'София е представителката на България.',
'token': 105456,
'token_str': 'представителката'},
{'score': 0.03053216263651848,
'sequence': 'София е присъединяването на България.',
'token': 18749,
'token_str': 'присъединяването'}]