Fourth hardware node joined the fleet, dedicated to repeatability runs and one-off experiments without blocking the primary pipelines. Small slice of throughput by volume; primary role is statistical tightening, not corpus scale.
Pipeline expansion & corpus evolution
What's live, what's landing next, and what the pipeline did between snapshots. Updated whenever new models, new schema, or new corpus milestones ship.
Agent-memory-core benchmark plots top-1 accuracy across a simulated 90-day horizon with two confuser waves. LangChain 32k dump split into two tracks to show "answer exists in context" vs "top-ranked chunk the LLM attends to" — the v2.3 thesis made visible.
- 3-seed grid at scale L — 250 queries × 2,300 confusers
- archon-memory-core (tuned) holds at 99.2% through day 90; retrieval-only drops to 49.2% after second confuser wave
Hero numbers and roadmap counters now pull from /stats.json on page load. Published snapshot refreshes from the producing nodes; the site can narrate real throughput instead of frozen text.
Full corpus breakdown at /analysis/: 8-category divergence table, four-node throughput, top-5 high-divergence prompts, batch continuity, and schema-limits disclosure. Regenerated when new corpus snapshots land.
Current v2.1 stores prompt-level divergence; v2.2 will add explicit pair-level disagreement matrices per prompt, so routers can query "which two models disagree on code-reasoning prompts longer than 512 tokens" directly. Included in Commercial tier at no extra cost.
REST endpoint that takes a prompt, returns a recommended open-weights model (and optional fallback) based on live divergence data. Beta opens to Commercial-tier customers first.
Each notable open-weights release enters the pipeline within a week of its drop. 90-day onboarding queue:
- Qwen 3 (7B / 14B / 32B) · DeepSeek V3 · Llama 4 · GLM 4.6
- Granite 3 · Phi-4 · Command R+ · Nous Hermes 3
- OLMo 2 · Tülu 3 · Yi 1.5 + Coder · StarCoder 2 · Mistral Large-Instruct
Today's 13-model fan-out is a first-corpus design choice, not a platform limit. Target: 50+ model variants by Q1 2027.
Combined four-node output crossed 80,000 prompts with roughly 332k total model inferences across 8 reasoning categories. 29% of prompts scored as high-divergence (pairwise divergence ≥ 0.5) — the empirical ceiling above which routing pays off.
Tagged preview release of the memory benchmark with the consolidation-aware adapter. Preregistered v2.1 harness: 6-checkpoint day grid, contradiction-injection scenarios, confuser wave at day 14. Public repo at atw4757-byte/archon-memory-core.
External-facing data surfaces switched to Node A / B / C / D labels with batch-NNNN identifiers. Internal hostnames removed from the site and public repo. Schema fields preserved; only display labels changed.
Stabilized JSONL schema: per-prompt metadata, per-model inference output, pairwise divergence scores, category tags. Ships as the Evaluation-tier snapshot today and as the ongoing feed for Commercial-tier subscribers.
Nodes A, B, and C running continuous fan-out against 7-model rotations. First full daily analysis covered 60k+ prompts. Marked the transition from spike collection to continuous corpus production.