Wave Field LLM — O(n log n) attention via wave equation dynamics, within 5% of standard transformer

I’ve been working on an alternative attention mechanism that treats language
as a physical field system instead of using standard O(n²) self-attention.

How it works:

  • Tokens are mapped onto a continuous 1D field
  • Information propagates via damped wave equations: k(t) = exp(-α·t)·cos(ω·t + φ)
  • Each attention head has just 3 learnable physics parameters (frequency, damping, phase)
  • Convolution computed via FFT in O(n log n)
  • Heads self-organize into different roles (local grammar, medium context, long-range)

Results (WikiText-2, 6M params, character tokenizer):

Model PPL Accuracy Complexity
Standard Transformer 5.9 51.0% O(n²)
Wave Field V3.5 6.2 50.5% O(n log n)

At longer sequences the savings grow: 31x at 2K tokens, 107x at 8K, 367x at 32K.

Known limitations:

  • With BPE tokenizer (8K vocab), there’s a significant capacity gap vs standard transformer
  • This is a model capacity issue at small scale, not an architecture flaw
  • Currently scaling to 100M params to see if the gap closes

What’s unique:

  • Every bug during development was found through physics-based diagnostics
    (energy flow, conservation, causality tests) — not guessing
  • Cross-head field coupling and wave interference for information routing
  • Not a Mamba/Hyena variant — different approach entirely

Code: GitHub - badaramoni/wave-field-llm: An O(n log n) language model architecture using wave equation dynamics instead of O(n²) self-attention. Within 5% of standard transformer quality.

Happy to answer questions about the physics, architecture decisions, or results.

5 Likes