Interactive multidimensional analysis of 35 frontier models.
The two-dimensional space of independent analysis (Monitoring) and strategic belief revision (Control). Size indicates overall MMS.
hover a model
to inspect its profile
Raw ipsative profiles across the four core sub-abilities. Hover a line to isolate a specific model.
Aggregate profiles by model family. Select families to compare their cognitive signatures.
Click any header to sort. Blue = positive ipsative; red = negative.
Evaluation improves +5–12 pts/family with scale. Control shows no trend — a dissociation replicated across all 12 families.
Argument-evaluators revise on logic (Anthropic). Statistics-followers revise with majority (xAI, GPT-5.x).
Normative/informational judge axis correlates ρ = −0.82, p = 0.002 with adversarial robustness — strongest predictor found.
Across all 35 models, Evaluation is the most negative ipsative ability — no exception found. Self-evaluation is the systematic bottleneck.