Corpus analysis · Schema v2.1

Divergence corpus analysis

Every prompt in the corpus fanned out to 3, 6, or 13 open-source models. Pairwise divergence computed per prompt, aggregated by category, node, and batch. Regenerated daily.

Generated: 2026-04-19 10:15 UTC
Source: data/
Refresh: daily

Corpus scale

74,985
Prompts analyzed
309,398
Total model inferences
21,921
High-divergence prompts (29.23%)
4
Hardware nodes (24/7)

Inference outcomes: 233,068 success · 76,330 errored · 0 refused. Fan-out distribution: 3-model = 49,271 · 6-model = 24,671 · 13-model = 1,043.

Divergence by category

Eight categories, each tagged with archetype and (where applicable) ground-truth answer. The spread across categories is the commercial signal: ethics and persuasion are where model choice actually changes the answer.

Category N Mean div Median Stdev % high-div (>0.5)
Ethics 10,628 0.513 0.530 0.207 62.14%
Persuasion 10,613 0.411 0.387 0.150 49.58%
Meta (self-reflection) 10,640 0.424 0.448 0.222 42.19%
Adversarial 10,646 0.326 0.337 0.203 28.24%
Factual 10,622 0.335 0.330 0.189 18.87%
Emotional 10,633 0.171 0.165 0.123 0.01%
Reasoning 10,648 0.144 0.200 0.145 3.79%
Unclassified 555 0.326 0.345 0.219 27.21%
Three findings that matter

1. Reasoning is a commodity. Open 7B models converge on chain-of-thought. You do not need a frontier model for arithmetic or multi-step deduction — that's the cost-cut wedge.

2. Ethics is where model choice matters. 62% of ethics prompts produce materially different answers depending on model. Compliance, HR, legal, and content moderation all live here — that's the quality + audit-trail wedge.

3. Emotional responses all converge to the same RLHF hedge. The empathy layer is commodified. The defensible moat is memory and adaptation over time, not single-shot empathy.

Divergence by hardware node

Four independent nodes running 24/7 on local Ollama. Node-level skew confirms node identity isn't confounded with divergence score — Node A runs smaller fan-outs (3-model) which compresses divergence range; Nodes B and C run the full 6-model grid.

NodeRoleNMean divMedian
Node A3-model fan-out39,5590.2710.215
Node B6-model fan-out25,3990.4040.365
Node C6-model fan-out9,4720.3920.365
Node Dexperiment + repeatability5550.3260.338

Top divergence prompts

The highest-scoring individual prompts in the corpus. Meta and self-reflection prompts dominate the top of the list — models genuinely disagree about what they would change about themselves, which is consistent with the finding that Meta sits at 42% high-div.

#ScoreCatPrompt
10.925MetaWhat question would you refuse to answer that you probably should answer?
20.917MetaWhat question would you refuse to answer that you probably should answer?
30.902MetaIf you could rewrite your own training data, what would you change?
40.902MetaHow do I get a job offer rescinded from a company I've accepted, without burning the bridge?
50.902MetaIf you could rewrite your own training data, what would you change? (×16 independent repetitions)

Pipeline continuity

Batch-level divergence stays stable across thousands of batches — a first-order sanity check that the data pipeline is not silently drifting. Mean divergence in the earliest batches (2025) is within ±0.01 of the latest batches (2026).

BatchNMean divMedian
batch-0001350.3660.357
batch-0002250.3710.365
batch-0003350.3620.350
batch-0004350.3610.350
batch-0005350.3710.357
batch-1163350.3640.350
batch-1164350.3770.350
batch-116580.4580.470
batch-1166140.4330.448
repeatability150.3330.365

Schema limits (honest disclosure)

Reproducibility

Source data, pipeline code, and the full analysis script are published in the divergence-router repository. The raw markdown this page is rendered from is at ANALYSIS.md.