GPT‑2 LoRA – Dialogue Summarization (SAMSum)

This model is a Parameter‑Efficient Fine‑Tuned (PEFT) version of GPT‑2 using Low‑Rank Adaptation (LoRA).
It was trained on the SAMSum dataset to generate concise summaries of messenger‑style dialogues.

Intended Use

  • Primary task: Dialogue summarisation
  • Input: A raw dialogue string (e.g., multi‑turn conversation)
  • Output: A short, fluent summary
  • Example prompt format used during training:
    Summarize the following dialogue.
    
    Dialogue:
    {dialogue}
    
    Summary:
    

The model performs best on conversations similar to SAMSum (informal, chat‑like, English).

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "openai-community/gpt2"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id)

model = PeftModel.from_pretrained(model, "violetar/gpt2-samsum-lora")

model = model.merge_and_unload()

def generate_summary(dialogue: str, max_new_tokens=128) -> str:
    prompt = f"Summarize the following dialogue.\n\nDialogue:\n{dialogue}\n\nSummary:\n"
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=768)
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        do_sample=False,
        temperature=1.0,
        pad_token_id=tokenizer.eos_token_id,
    )
    summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Remove prompt part
    summary = summary.split("Summary:\n")[-1].strip()
    return summary

Training Details

  • Base model: openai-community/gpt2 (124M parameters)
  • Dataset: SAMSum – 14,732 dialogues + summaries
  • Training script: Modified version of gpt2.py (provided in the lab)
  • LoRA configuration:
    • r = 32
    • lora_alpha = 16
    • lora_dropout = 0.0
    • Target modules: ["c_attn", "c_proj", "c_fc"] (all GPT‑2 attention & FFN projections)
  • Hyperparameters:
    • Max sequence length: 768 tokens
    • Batch size: 4 (per device), gradient accumulation steps: 4 → effective batch size 16
    • Learning rate: 2e‑4 (AdamW, warmup ratio 0.03)
    • Epochs: 3
    • Mixed precision: bf16 if supported, else fp16
    • Optimizer: adamw_torch
Downloads last month
147
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for violetar/gpt2-samsum-lora

Adapter
(1688)
this model

Dataset used to train violetar/gpt2-samsum-lora