argilla/FinePersonas-Conversations-Email-Summaries
Viewer • Updated • 364k • 58 • 15
How to use wordcab/t5-small-email-summarizer with Transformers:
# Use a pipeline as a high-level helper
# Warning: Pipeline type "summarization" is no longer supported in transformers v5.
# You must load the model directly (see below) or downgrade to v4.x with:
# 'pip install "transformers<5.0.0'
from transformers import pipeline
pipe = pipeline("summarization", model="wordcab/t5-small-email-summarizer") # Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("wordcab/t5-small-email-summarizer")
model = AutoModelForSeq2SeqLM.from_pretrained("wordcab/t5-small-email-summarizer")This is a fine-tuned T5-small model specialized for email summarization. The model can generate both brief (one-line) and detailed (comprehensive) summaries of emails, and is robust to messy, informal inputs with typos and abbreviations.
summarize_brief: and summarize_full: prefixesThe model was fine-tuned on the argilla/FinePersonas-Conversations-Email-Summaries dataset containing 364,000 email-summary pairs with:
Training data was augmented with:
Subject: [subject]. Body: [content]pip install transformers torch
from transformers import T5ForConditionalGeneration, T5Tokenizer
# Load model and tokenizer
tokenizer = T5Tokenizer.from_pretrained("wordcab/t5-small-email-summarizer")
model = T5ForConditionalGeneration.from_pretrained("wordcab/t5-small-email-summarizer")
# Example email
email = """Subject: Team Meeting Tomorrow. Body: Hi everyone,
Just a reminder that we have our weekly team meeting tomorrow at 2 PM EST.
Please prepare your status updates and any blockers you're facing.
We'll also discuss the Q4 roadmap. Thanks!"""
# Generate brief summary
inputs = tokenizer(f"summarize_brief: {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=50, num_beams=2)
brief_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Brief: {brief_summary}")
# Generate full summary
inputs = tokenizer(f"summarize_full: {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=150, num_beams=2)
full_summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Full: {full_summary}")
For emails longer than 512 tokens, consider using chunking:
def summarize_long_email(email, model, tokenizer, mode="brief"):
# Check if email fits in context
tokens = tokenizer.encode(email)
if len(tokens) <= 500: # Leave room for prefix
# Direct summarization
prefix = f"summarize_{mode}:" if mode in ["brief", "full"] else "summarize:"
inputs = tokenizer(f"{prefix} {email}", return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=150 if mode == "full" else 50)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# For longer emails, use strategic truncation or chunking
# ... implement chunking strategy
import requests
API_URL = "https://huggingface.co/proxy/api-inference.huggingface.co/models/wordcab/t5-small-email-summarizer"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "summarize_brief: Subject: Meeting. Body: Let's meet tomorrow at 3pm to discuss the project.",
})
docker run --gpus all -p 8080:80 \
-v t5-small-email-summarizer:/model \
ghcr.io/huggingface/text-generation-inference:latest \
--model-id wordcab/t5-small-email-summarizer \
--max-input-length 512 \
--max-total-tokens 662
T5ForConditionalGeneration(
(shared): Embedding(32128, 512)
(encoder): T5Stack(
(embed_tokens): Embedding(32128, 512)
(block): ModuleList(
(0-5): T5Block(...)
)
(final_layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1)
)
(decoder): T5Stack(...)
(lm_head): Linear(in_features=512, out_features=32128, bias=False)
)
If you use this model, please cite:
@misc{wordcab2025t5email,
title={T5 Email Summarizer - Brief & Full},
author={Wordcab Team},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/wordcab/t5-small-email-summarizer}
}
This model is released under the Apache 2.0 License.
For questions or feedback, please open an issue on the model repository.
Base model
Falconsai/text_summarization