Physics-inspired attention: GPT-2-small performance with only 0.64M params, trained in 12min on a Mac

Hi everyone,

I’d like to share TickBlock, a physics-inspired alternative to transformer attention. Instead of QKᵀ, it uses a learnable banded positional operator (“tensor mode”) motivated by my research in fundamental physics.

  • 0.64M parameters (≈0.5% of GPT-2 small)

  • Matches GPT-2 small–level performance on Tiny Shakespeare

  • Trains in ~12 minutes on a Mac laptop (MPS)

  • This is without any kernel optimizations — so there’s headroom left

GitHub repo
Physics background (if curious)

I’m curious what the HF community thinks, as this already provides potential for big gains and early prototype may be further optimized in multiple ways (parameters fine-tuning, kernel optimization, code optimization). Would love your feedback and ideas.

2 Likes

A positive idea.

Just thinking what if you scale it up, instead of a Mac Laptop, use a bigger computing source and see what are the results.

2 Likes

Thanks! It is still fresh - I started yesterday and wasn’t that much in AI before, though I was in software development.

I just had some feeling that the more people play with, the better results will come out and those starting results should be very promising - I am not the biggest expert to talk about them but numbers seem very good when compared to the benchmark.

1 Like

gather a team of experts dude, like undergrads/grads in AI who could help you elevate this thing, research-worthy.

1 Like