Endorsement Request (cs.AI / cs.LG) - Project Janus Part II: Mechanistic Validation of Orthogonal Regularization via Sparse Autoencoder Analysis

Abstract/Context:

I am an independent researcher (Exorobourii LLC) seeking an endorsement to submit my latest paper to the cs.AI (Artificial Intelligence) or cs.LG (Machine Learning) category on arXiv.

​In Project Janus Part I, I demonstrated that “Vector Space Homeostasis” (VSM)—a technique enforcing orthogonal regularization—improved logical coherence in 40M parameter models. In this paper (Part II), I move from behavioral metrics to causal evidence, utilizing Sparse Autoencoders (SAEs) to decompose the model’s internal representations.

​Key Findings:

​Mechanistic Validation: We observed a 60.3% reduction in inter-head correlation, confirming that VSM physically disentangles representation subspaces.

​The “Sparsity Crossover”: SAE analysis revealed a novel “Filter-then-Pack” strategy: adaptive models are significantly sparser in early layers (noise filtering) but denser in deep layers (semantic packing) compared to controls.

​Load-Bearing Heads: Ablation studies identified the emergence of “Super-Heads” that carry 2.54x the functional importance of control heads, marking a shift from redundant syntactic scaffolding to distinct semantic binding.

​Reproducibility & Artifacts:

I believe strongly in “Glass Box” research. To that end, I have released the “Janus Mechanistic Workbench,” which includes:

​Model Checkpoints: Full PyTorch state dictionaries for both the Janus-v3-Control and Janus-v3-Adaptive models.

​Telemetry Data: The raw CSV outputs from our topology analysis, zero-mask ablation studies, and the Sparse Autoencoder (SAE) feature density

metrics.

The models .pt files can be found here: EXOROBOURII/Janus_v3_Initial · Hugging Face