README + eval.py: align with current bit-cascaded ternary layout

README:
- Tensor naming example used the pre-cascade modular.mod5.layer2.eq3
identifier, which no longer exists. Replaced with the current
modular.mod5.eq.k15.bit3.match form.
- Storage paragraph claimed comparator weights and wide single-layer
threshold gates use int16 / int32. Every weight and bias in the
canonical model is int8.
- Ternary-mode paragraph reported 174 buffers rewritten and 183
non-ternary tensors remaining. Audit shows zero non-ternary weights
in the canonical model: comparators, modular detectors, and division
stages have all been bit-cascaded in build.py. Updated to reflect.

eval.py:
- _test_integration's mul_then_mod sub-test was a no-op: it loaded
modular.mod3.layer1/layer2 weights (now bit-cascaded, so KeyError +
silent SKIP) and accumulated op_scores += 1 unconditionally even
before the rename. Replaced with a real walk through the bit-cascade
modular.mod3 detector via a new _pop_modN helper, asserting
product % 3 == 0 against the threshold output. mul_then_mod now
PASSes 4/4. Integration suite is 19/19 across all variants.

Files changed (2) hide show

README.md +3 -3
eval.py +31 -14

README.md CHANGED Viewed

@@ -195,7 +195,7 @@ Examples:
   boolean.and.weight
   boolean.xor.layer1.neuron1.weight
   arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight
-  modular.mod5.layer2.eq3.weight
   error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias
 ```
@@ -253,9 +253,9 @@ python quantize.py file.safetensors --ternary         # push toward {-1, 0, 1} w
 python quantize.py file.safetensors --ternary --strict  # error if any weight is non-ternary
 ```
-Most tensors fit in `int8`; comparator weights and a few wide single-layer threshold gates use `int16` or `int32`. The eval pipeline promotes weights to `float32` on load, so integer storage is exact and transparent.
-**Ternary mode.** With `--ternary`, the quantizer also rewrites single-input `weight=±2` identity buffers (SHL/SHR/ROL/ROR bit gates, stack data buffers, RET address buffers, flag buffers) as `weight=±1` with bias adjusted to preserve the heaviside output for binary inputs (`H(2x - 1) ≡ H(x - 1)` etc.). After this pass the canonical model has 174 buffer gates rewritten and 183 weight tensors remaining non-ternary, all of which are positional comparators (8/16-bit single-layer, byte-level cascade gates, division-stage comparators) and a handful of hand-constructed modular arithmetic gates. Fully ternarizing those requires bit-cascading them in `build.py`, which is a structural change rather than a quantization pass. The metadata field `weight_quantization` records `ternary` (clean) or `ternary_partial` (some violations remain).
 ---

   boolean.and.weight
   boolean.xor.layer1.neuron1.weight
   arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight
+  modular.mod5.eq.k15.bit3.match.weight
   error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias
 ```
 python quantize.py file.safetensors --ternary --strict  # error if any weight is non-ternary
 ```
+Every weight and bias tensor in the canonical model fits in `int8`. The eval pipeline promotes weights to `float32` on load, so integer storage is exact and transparent.
+**Ternary mode.** With `--ternary`, the quantizer also rewrites single-input `weight=±2` identity buffers (SHL/SHR/ROL/ROR bit gates, stack data buffers, RET address buffers, flag buffers) as `weight=±1` with bias adjusted to preserve the heaviside output for binary inputs (`H(2x - 1) ≡ H(x - 1)` etc.). The canonical model has zero non-ternary weights as built; the comparators, modular detectors, and division stages that previously required positional weights up to ±2³¹ have all been bit-cascaded into multi-layer ternary equivalents in `build.py`. The metadata field `weight_quantization` records `ternary` (clean) or `ternary_partial` (some violations remain).
 ---

eval.py CHANGED Viewed

@@ -4253,6 +4253,31 @@ class BatchedFitnessEvaluator:
     # INTEGRATION TESTS (Multi-circuit chains)
     # =========================================================================
     def _pop_cmp8bit(self, pop: Dict, pop_size: int,
                      a_bits: torch.Tensor, b_bits: torch.Tensor,
                      kind: str) -> torch.Tensor:
@@ -4358,22 +4383,14 @@ class BatchedFitnessEvaluator:
             tests = [(3, 5), (4, 6), (7, 11), (9, 9)]
             for a, b in tests:
                 product = (a * b) & 0xFF
-                expected_mod3 = product % 3
-                # Test using mod3 circuit
                 prod_bits = torch.tensor([((product >> (7 - i)) & 1) for i in range(8)],
-                                        device=self.device, dtype=torch.float32)
-                # mod3 has layer1 and layer2
-                w1 = pop['modular.mod3.layer1.weight'].view(pop_size, 8)
-                b1 = pop['modular.mod3.layer1.bias'].view(pop_size)
-                h1 = heaviside((prod_bits * w1).sum(-1) + b1)
-                w2 = pop['modular.mod3.layer2.weight'].view(pop_size, 8)
-                b2 = pop['modular.mod3.layer2.bias'].view(pop_size)
-                h2 = heaviside((prod_bits * w2).sum(-1) + b2)
-                # Combine to get residue (simplified: check if output matches expected)
-                op_scores += 1  # Simplified test
                 op_total += 1
             scores += op_scores

     # INTEGRATION TESTS (Multi-circuit chains)
     # =========================================================================
+    def _pop_modN(self, pop: Dict, pop_size: int, val_bits: torch.Tensor,
+                  modulus: int) -> torch.Tensor:
+        """Drive the bit-cascade modular.mod{N} divisibility detector.
+        Returns a (pop_size,) tensor: 1 iff the 8-bit value (MSB-first bits in
+        val_bits) is divisible by ``modulus``. Walks the per-multiple match
+        gates (modular.modN.eq.k{val}.bit{i}.match -> .all -> top-level OR).
+        """
+        ks = [k for k in range(256) if k % modulus == 0]
+        alls = []
+        for k in ks:
+            matches = []
+            for i in range(8):
+                w = pop[f'modular.mod{modulus}.eq.k{k}.bit{i}.match.weight'].view(pop_size, 1)
+                b = pop[f'modular.mod{modulus}.eq.k{k}.bit{i}.match.bias'].view(pop_size)
+                matches.append(heaviside(val_bits[i] * w[:, 0] + b))
+            all_inp = torch.stack(matches, dim=-1)
+            w_all = pop[f'modular.mod{modulus}.eq.k{k}.all.weight'].view(pop_size, 8)
+            b_all = pop[f'modular.mod{modulus}.eq.k{k}.all.bias'].view(pop_size)
+            alls.append(heaviside((all_inp * w_all).sum(-1) + b_all))
+        top_inp = torch.stack(alls, dim=-1)
+        w_top = pop[f'modular.mod{modulus}.weight'].view(pop_size, len(ks))
+        b_top = pop[f'modular.mod{modulus}.bias'].view(pop_size)
+        return heaviside((top_inp * w_top).sum(-1) + b_top)
     def _pop_cmp8bit(self, pop: Dict, pop_size: int,
                      a_bits: torch.Tensor, b_bits: torch.Tensor,
                      kind: str) -> torch.Tensor:
             tests = [(3, 5), (4, 6), (7, 11), (9, 9)]
             for a, b in tests:
                 product = (a * b) & 0xFF
+                expected = float(product % 3 == 0)
+                # Drive product bits through the bit-cascade mod3 detector;
+                # output is 1 iff product is divisible by 3.
                 prod_bits = torch.tensor([((product >> (7 - i)) & 1) for i in range(8)],
+                                         device=self.device, dtype=torch.float32)
+                out = self._pop_modN(pop, pop_size, prod_bits, 3)
+                op_scores += (out == expected).float()
                 op_total += 1
             scores += op_scores