Changelog

Pipeline expansion & corpus evolution

What's live, what's landing next, and what the pipeline did between snapshots. Updated whenever new models, new schema, or new corpus milestones ship.

In flight
AMB v2.3 time-series chart publishedlive

Agent-memory-core benchmark plots top-1 accuracy across a simulated 90-day horizon with two confuser waves. LangChain 32k dump split into two tracks to show "answer exists in context" vs "top-ranked chunk the LLM attends to" — the v2.3 thesis made visible.

Live stats hydration live

Hero numbers and roadmap counters now pull from /stats.json on page load. Published snapshot refreshes from the producing nodes; the site can narrate real throughput instead of frozen text.

Public analysis page shipped live

Full corpus breakdown at /analysis/: 8-category divergence table, four-node throughput, top-5 high-divergence prompts, batch continuity, and schema-limits disclosure. Regenerated when new corpus snapshots land.

Planned · next 30 days
Schema v2.2 — pair-level disagreement tables planned

Current v2.1 stores prompt-level divergence; v2.2 will add explicit pair-level disagreement matrices per prompt, so routers can query "which two models disagree on code-reasoning prompts longer than 512 tokens" directly. Included in Commercial tier at no extra cost.

Router API beta planned

REST endpoint that takes a prompt, returns a recommended open-weights model (and optional fallback) based on live divergence data. Beta opens to Commercial-tier customers first.

Model family onboarding planned

Each notable open-weights release enters the pipeline within a week of its drop. 90-day onboarding queue:

Today's 13-model fan-out is a first-corpus design choice, not a platform limit. Target: 50+ model variants by Q1 2027.

Shipped
Divergence corpus crosses 80k prompts milestone

Combined four-node output crossed 80,000 prompts with roughly 332k total model inferences across 8 reasoning categories. 29% of prompts scored as high-divergence (pairwise divergence ≥ 0.5) — the empirical ceiling above which routing pays off.

Node D (experiment + repeatability) online

Fourth hardware node joined the fleet, dedicated to repeatability runs and one-off experiments without blocking the primary pipelines. Small slice of throughput by volume; primary role is statistical tightening, not corpus scale.

Agent-memory-core v0.2.1 — AMB v2 preview

Tagged preview release of the memory benchmark with the consolidation-aware adapter. Preregistered v2.1 harness: 6-checkpoint day grid, contradiction-injection scenarios, confuser wave at day 14. Public repo at atw4757-byte/archon-memory-core.

Hardware node provenance anonymized

External-facing data surfaces switched to Node A / B / C / D labels with batch-NNNN identifiers. Internal hostnames removed from the site and public repo. Schema fields preserved; only display labels changed.

Schema v2.1 — prompt-level divergence baseline

Stabilized JSONL schema: per-prompt metadata, per-model inference output, pairwise divergence scores, category tags. Ships as the Evaluation-tier snapshot today and as the ongoing feed for Commercial-tier subscribers.

Three-node pipeline operational milestone

Nodes A, B, and C running continuous fan-out against 7-model rotations. First full daily analysis covered 60k+ prompts. Marked the transition from spike collection to continuous corpus production.