Text Generation
Transformers
Safetensors
English
German
mistral
Merge
mergekit
text-generation-inference
Instructions to use malteos/hermeo-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use malteos/hermeo-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="malteos/hermeo-7b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("malteos/hermeo-7b") model = AutoModelForCausalLM.from_pretrained("malteos/hermeo-7b") - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use malteos/hermeo-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "malteos/hermeo-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "malteos/hermeo-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/malteos/hermeo-7b
- SGLang
How to use malteos/hermeo-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "malteos/hermeo-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "malteos/hermeo-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "malteos/hermeo-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "malteos/hermeo-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use malteos/hermeo-7b with Docker Model Runner:
docker model run hf.co/malteos/hermeo-7b
metadata
language:
- en
- de
library_name: transformers
pipeline_tag: text-generation
license: apache-2.0
tags:
- merge
- mergekit
Hermes + Leo = Hermeo
Hermeo-7B
A German-English language model merged from DPOpenHermes-7B-v2 and leo-mistral-hessianai-7b-chat using mergekit. Both base models are fine-tuned versions of Mistral-7B-v0.1.
Model details
- Merged from: leo-mistral-hessianai-7b-chat and DPOpenHermes-7B-v2
- Model type: Causal decoder-only transformer language model
- Languages: English and German
- License: Apache 2.0
How to use
You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:
>>> from transformers import pipeline, set_seed
>>> generator = pipeline('text-generation', model='malteos/hermeo-7b')
>>> set_seed(42)
>>> generator("Hallo, Ich bin ein Sprachmodell,", max_length=40, num_return_sequences=1)
[{'generated_text': 'Hallo, Ich bin ein Sprachmodell, das dir bei der Übersetzung von Texten zwischen Deutsch und Englisch helfen kann. Wenn du mir einen Text in Deutsch'}]
Acknowledgements
- This model release is heavily inspired by Weyaxi/OpenHermes-2.5-neural-chat-v3-2-Slerp
- Thanks to the authors of the base models: Mistral, LAION, HessianAI, Open Access AI Collective, @teknium, @bjoernp
- The German evaluation datasets and scripts from @bjoernp were used.
- The computing resources from DFKI's PEGASUS cluster were used for the evaluation.
Evaluation
The evaluation methdology of the Open LLM Leaderboard is followed.
German benchmarks
| German tasks: | MMLU-DE | Hellaswag-DE | ARC-DE | Average |
|---|---|---|---|---|
| Models / Few-shots: | (5 shots) | (10 shots) | (24 shots) | |
| 7B parameters | ||||
| llama-2-7b | 0.400 | 0.513 | 0.381 | 0.431 |
| leo-hessianai-7b | 0.400 | 0.609 | 0.429 | 0.479 |
| bloom-6b4-clp-german | 0.274 | 0.550 | 0.351 | 0.392 |
| mistral-7b | 0.524 | 0.588 | 0.473 | 0.528 |
| leo-mistral-hessianai-7b | 0.481 | 0.663 | 0.485 | 0.543 |
| leo-mistral-hessianai-7b-chat | 0.458 | 0.617 | 0.465 | 0.513 |
| DPOpenHermes-7B-v2 | 0.517 | 0.603 | 0.515 | 0.545 |
| hermeo-7b (this model) | 0.511 | 0.668 | 0.528 | 0.569 |
| 13B parameters | ||||
| llama-2-13b | 0.469 | 0.581 | 0.468 | 0.506 |
| leo-hessianai-13b | 0.486 | 0.658 | 0.509 | 0.551 |
| 70B parameters | ||||
| llama-2-70b | 0.597 | 0.674 | 0.561 | 0.611 |
| leo-hessianai-70b | 0.653 | 0.721 | 0.600 | 0.658 |
English benchmarks
| English tasks: | MMLU | Hellaswag | ARC | Average |
|---|---|---|---|---|
| Models / Few-shots: | (5 shots) | (10 shots) | (24 shots) | |
| llama-2-7b | 0.466 | 0.786 | 0.530 | 0.594 |
| leolm-hessianai-7b | 0.423 | 0.759 | 0.522 | 0.568 |
| bloom-6b4-clp-german | 0.264 | 0.525 | 0.328 | 0.372 |
| mistral-7b | 0.635 | 0.832 | 0.607 | 0.691 |
| leolm-mistral-hessianai-7b | 0.550 | 0.777 | 0.518 | 0.615 |
| hermeo-7b (this model) | 0.601 | 0.821 | 0.620 | 0.681 |
Prompting / Prompt Template
Prompt dialogue template (ChatML format):
"""
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
"""
The model input can contain multiple conversation turns between user and assistant, e.g.
<|im_start|>user
{prompt 1}<|im_end|>
<|im_start|>assistant
{reply 1}<|im_end|>
<|im_start|>user
{prompt 2}<|im_end|>
<|im_start|>assistant
(...)
License
See also
- AWQ quantized version: https://huggingface.co/mayflowergmbh/hermeo-7b-awq
