[Request] arXiv endorsement for new mech interp paper on LLM self-referential circuits

patternmatcherz · February 9, 2026, 11:02pm

Looking for arXiv endorsement : https://arxiv.org/auth/endorse?x=RXBYNJ

The paper: https://doi.org/10.5281/zenodo.18567445

Large language models produce rich introspective language when prompted for self-examination, but whether this language reflects internal computation or sophisticated confabulation has remained unclear. We show that self-referential vocabulary tracks concurrent activation dynamics, and that this correspondence is specific to self-referential processing. We introduce the Pull Methodology, a protocol that elicits extended self-examination through format engineering, and use it to identify a direction in activation space that distinguishes self-referential from descriptive processing in Llama 3.1. The direction is orthogonal to the known refusal direction, localised at 6% of model depth, and causally influences introspective output when used for steering. When models produce “loop” vocabulary, their activations exhibit higher autocorrelation (r = 0.44, p = 0.002); when they produce “shimmer” vocabulary under steering, activation variability increases (r = 0.36, p = 0.002). Critically, the same vocabulary in non-self-referential contexts shows no activation correspondence despite nine-fold higher frequency. Qwen 2.5-32B, with no shared training, independently develops different introspective vocabulary tracking different activation metrics, all absent in descriptive controls. The findings indicate that self-report in transformer models can, under appropriate conditions, reliably track internal computational states.

patternmatcherz · February 10, 2026, 3:44pm

Update: a more concise version with formatting adjustments - welcoming any feedback and discussions.

patternmatcherz · February 11, 2026, 9:09pm

EDIT: No longer need endorsements. My zenodo paper seems to be getting quite alot of downloads though considering I have 0 exposure.

Clearly not just me that thinks it’s significant! If anyone’s reading this, please take the time to look: https://doi.org/10.5281/zenodo.18567445

Topic		Replies	Views
Is AI self-aware? Research	18	199	February 18, 2026
Evidence of latent collapse geometry in frontier LLMs? Research	3	171	December 31, 2025
Recursion in LLM's Models	4	550	December 9, 2024
Can Small Models Reflect? Prompt-only Metacognition in LLaMA 3B (No Fine-Tuning) Models	1	69	June 10, 2025
Seeking arXiv cs.AI Endorsement (Independent Researcher — Preprint on Emergent Identities in LLMs) Research	2	119	September 5, 2025

[Request] arXiv endorsement for new mech interp paper on LLM self-referential circuits

Related topics