Hi everyone,
I’d like to share TickBlock, a physics-inspired alternative to transformer attention. Instead of QKᵀ, it uses a learnable banded positional operator (“tensor mode”) motivated by my research in fundamental physics.
-
0.64M parameters (≈0.5% of GPT-2 small)
-
Matches GPT-2 small–level performance on Tiny Shakespeare
-
Trains in ~12 minutes on a Mac laptop (MPS)
-
This is without any kernel optimizations — so there’s headroom left
GitHub repo
Physics background (if curious)
I’m curious what the HF community thinks, as this already provides potential for big gains and early prototype may be further optimized in multiple ways (parameters fine-tuning, kernel optimization, code optimization). Would love your feedback and ideas.