I wanted to share a small experiment I ran on the Transformers stack—specifically modeling_utils.py (v5.5.0).
Instead of modifying the source, I wrapped the file in a separate runtime layer and dropped it back into the stack unchanged. The original file remains byte-identical. The only addition is an external execution layer that runs alongside it.
From there, I tested whether I could introduce behavior without editing the original implementation:
-
Basic request validation (injection / XSS patterns)
-
Persistent state across calls
-
Simple recovery checkpoints
-
Execution-time observation
After reinserting the file, the stack still behaved normally. I then ran a few small tests against a model to see if the added layer would actually execute in practice.
Example results:
-
Malicious inputs were blocked by the runtime layer
-
Normal model usage was unaffected
-
State persisted across calls without touching model code
The repo (full copy of Transformers + runtime layer) is here:
What I’m trying to explore
The idea is pretty simple:
Can behavior like validation, memory, or observability be added around a system instead of inside it?
Not proposing this as a replacement for existing patterns—just exploring whether a runtime-layer approach can be made consistent across different types of software.
What I’d love feedback on
-
Is this actually useful compared to standard hooks/middleware?
-
Where would something like this break in real-world usage?
-
Are there existing patterns in Transformers I should be leveraging instead?
Appreciate anyone taking a look.