April 2026 Puzzle 1(a)

Max-of-5 mechanistic analysis

A compact index for the static reports generated while reverse-engineering the one-layer attention-only transformer.

`[ANS]` Attention Patterns

Shows that Head 3's [ANS] attention is max-selective on the sampled inputs, while Head 1 mostly attends to special tokens.

Tests whether Head 3 copies the selected max digit into the output logits. The result is not a simple diagonal copy circuit.

Decomposes final [ANS] logits into the residual stream and each attention head, then attributes the winning answer margin.

Measures raw pre-softmax QK scores from [ANS] to digit keys. Head 3 has the cleanest monotonic score ramp from digit 0 to digit 9.