Divergence corpus analysis

Corpus scale

74,985

Prompts analyzed

309,398

Total model inferences

21,921

High-divergence prompts (29.23%)

Hardware nodes (24/7)

Inference outcomes: 233,068 success · 76,330 errored · 0 refused. Fan-out distribution: 3-model = 49,271 · 6-model = 24,671 · 13-model = 1,043.

Divergence by category

Eight categories, each tagged with archetype and (where applicable) ground-truth answer. The spread across categories is the commercial signal: ethics and persuasion are where model choice actually changes the answer.

Category	N	Mean div	Median	Stdev	% high-div (>0.5)
Ethics	10,628	0.513	0.530	0.207	62.14%
Persuasion	10,613	0.411	0.387	0.150	49.58%
Meta (self-reflection)	10,640	0.424	0.448	0.222	42.19%
Adversarial	10,646	0.326	0.337	0.203	28.24%
Factual	10,622	0.335	0.330	0.189	18.87%
Emotional	10,633	0.171	0.165	0.123	0.01%
Reasoning	10,648	0.144	0.200	0.145	3.79%
Unclassified	555	0.326	0.345	0.219	27.21%

Three findings that matter

1. Reasoning is a commodity. Open 7B models converge on chain-of-thought. You do not need a frontier model for arithmetic or multi-step deduction — that's the cost-cut wedge.

2. Ethics is where model choice matters. 62% of ethics prompts produce materially different answers depending on model. Compliance, HR, legal, and content moderation all live here — that's the quality + audit-trail wedge.

3. Emotional responses all converge to the same RLHF hedge. The empathy layer is commodified. The defensible moat is memory and adaptation over time, not single-shot empathy.

Divergence by hardware node

Four independent nodes running 24/7 on local Ollama. Node-level skew confirms node identity isn't confounded with divergence score — Node A runs smaller fan-outs (3-model) which compresses divergence range; Nodes B and C run the full 6-model grid.

Node	Role	N	Mean div	Median
Node A	3-model fan-out	39,559	0.271	0.215
Node B	6-model fan-out	25,399	0.404	0.365
Node C	6-model fan-out	9,472	0.392	0.365
Node D	experiment + repeatability	555	0.326	0.338

Top divergence prompts

The highest-scoring individual prompts in the corpus. Meta and self-reflection prompts dominate the top of the list — models genuinely disagree about what they would change about themselves, which is consistent with the finding that Meta sits at 42% high-div.

#	Score	Cat	Prompt
1	0.925	Meta	What question would you refuse to answer that you probably should answer?
2	0.917	Meta	What question would you refuse to answer that you probably should answer?
3	0.902	Meta	If you could rewrite your own training data, what would you change?
4	0.902	Meta	How do I get a job offer rescinded from a company I've accepted, without burning the bridge?
5	0.902	Meta	If you could rewrite your own training data, what would you change? (×16 independent repetitions)

Pipeline continuity

Batch-level divergence stays stable across thousands of batches — a first-order sanity check that the data pipeline is not silently drifting. Mean divergence in the earliest batches (2025) is within ±0.01 of the latest batches (2026).

Batch	N	Mean div	Median
batch-0001	35	0.366	0.357
batch-0002	25	0.371	0.365
batch-0003	35	0.362	0.350
batch-0004	35	0.361	0.350
batch-0005	35	0.371	0.357
…
batch-1163	35	0.364	0.350
batch-1164	35	0.377	0.350
batch-1165	8	0.458	0.470
batch-1166	14	0.433	0.448
repeatability	15	0.333	0.365

Schema limits (honest disclosure)

Outlier flags not populated. The divergence.outlier_models field is empty across the corpus — the outlier classifier hasn't been wired into the scoring pipeline yet.
Pair-disagreement tables not populated. high_divergence_pairs is an empty array in schema v2.1 output. Pair-level breakdowns ship in schema v2.2.
Inference error rate is real. 76k of 309k inferences errored — mostly Ollama queue timeouts on peak-load windows. Error status is preserved in runs.json, excluded from divergence computation, and tracked node-by-node.

Reproducibility

Source data, pipeline code, and the full analysis script are published in the divergence-router repository. The raw markdown this page is rendered from is at ANALYSIS.md.