Back to Blog

May 15, 2026

Tribal Knowledge Is the Bottleneck. Not Your Model.

Tribal knowledge in AI agents is the true bottleneck in enterprise agentic deployments — not model capability, not toolchain, not compute budget. Affirm proved it in February 2026, and the research has been quietly building the same case for over a year.

Affirm Retooled in One Week. Here's What Actually Did the Work.

Affirm's AI Retooling Week was not a hackathon or a pilot — it was an organizational forcing function that exposed the real constraint in agentic software development.

In February 2026, Affirm suspended all non-essential meetings, delayed product delivery dates, and required every engineer and manager to complete a fully agentic workflow — from task to submitted PR — by week's end. The compressed timeline was deliberate: the gap between engineers already using agentic tools effectively and their peers was widening, and the company wanted to close it fast.

The numbers were striking. As of April 2026, over 60% of PRs at Affirm are agent-assisted — up from near zero four months earlier. Weekly PR merged volume is up 58% year over year, and climbing. Few engineering organizations have publicly reported a comparable step-change at that pace.

But the headline numbers obscure the more important finding.

The working group of nine engineers assembled before Retooling Week had a precise mandate: produce a repeatable agentic workflow that lets the average Affirm developer automate most of their coding work without bespoke setup or expert knowledge. The infrastructure they built centered on a layered system of context files — architectural decisions, domain knowledge, team conventions — stored where agents could retrieve them before implementation started, so output quality rises and review burden falls. They also built a central marketplace of team-level skills: structured, machine-readable institutional knowledge that agents consumed before generating any output.

Agentic coding amplified every existing friction point in the development pipeline. When code generation is near-instant, inaccessible documentation and weak local testing go from annoyances developers work around to showstoppers.

The lesson was direct: the limiting factor had always been engineering cycles, not ideas. AI unlocked those cycles specifically because domain context was made legible to the agent. Claude Code was the tool. Structured knowledge was the engine.

Why Tribal Knowledge Defeats Agentic AI at Enterprise Scale

The productivity paradox in agentic engineering is not a perception problem. It is a measurement problem — and the measurements are damning.

75% of engineers now use AI coding tools yet most organizations report no measurable org-level performance gains, per the Faros AI Productivity Paradox Report. The sharpest evidence comes from METR: in a randomized controlled trial, 16 developers with an average of five years' experience completed 246 tasks on mature projects they knew intimately. Developers predicted AI would cut completion time by 24%; post-study, they estimated a 20% improvement. The measured result was a 19% increase — AI tooling slowed them down.

That 43-point gap between perception and reality should end any conversation about anecdotal evidence for AI productivity.

The pattern repeats across data sources. On high-AI-adoption teams, developers merge far more PRs — but PR review time increases sharply. The bottleneck doesn't disappear; it migrates from code generation to knowledge validation. Bain's Technology Report 2025 found two in three software firms have rolled out generative AI, yet 10–15% productivity gains rarely convert to business value because time saved isn't redirected toward higher-value work.

The model is not the variable. The system is.

METR's results indicate AI performs worse in settings with very high quality standards and many implicit requirements — documentation conventions, testing coverage, architectural rationale — that humans take substantial time to internalize. That describes every mature enterprise codebase. As one agentic engineering team put it, if you don't fully describe what the intent is, you get random results. That specification gap is the tribal knowledge problem. Swapping models doesn't close it.

Individual output rises. Organizational throughput stalls. The culprit is consistently unstructured, fragmented institutional knowledge sitting beneath the agent layer.

The Memory Supercycle Makes Every Bad Agent Inference More Expensive

This problem becomes financially urgent in 2026, not just technically inconvenient.

AI is projected to consume approximately 20% of global DRAM wafer capacity in 2026, with each gigabyte of HBM consuming roughly 3× the wafer capacity of standard DDR5. That structural demand shift is landing on supply that was already constrained. DRAM prices surged approximately 90% in Q1 2026 versus Q4 2025 — the sharpest quarterly spike in memory market history, per Counterpoint Research and TrendForce. Micron is sold out of HBM through all of 2026; SK Hynix has booked its entire advanced packaging capacity through year-end.

Every RAG retrieval, every agentic reasoning step, every inference call runs on this increasingly scarce substrate.

The HBM market TAM is forecast to reach $100 billion by 2028 — larger than the entire DRAM market was in 2024. Relief on conventional memory supply is not expected before 2027. PC makers including Dell, Lenovo, and HP have already signaled 15–20% hardware price increases in 2026 as memory costs propagate downstream — a direct tax on enterprise AI infrastructure budgets.

The enterprise implication is arithmetic: agents grounded in stale, conflicted, or fragmented knowledge burn expensive compute producing wrong answers. A hallucinated response is not just a trust failure — it's a compute bill for an inference that should never have run. Knowledge quality is now a cost control lever, not a quality assurance concern.

What the Fix Looks Like: Structured Knowledge, Not a Better Model

The organizations seeing compounding returns from agentic AI share one characteristic: they made their institutional knowledge machine-readable before they deployed agents.

Goldman Sachs fine-tuned its internal development platform on its own codebase and project documentation, delivering context-aware acceleration that generic tools cannot match without that institutional grounding. Affirm built a layered context architecture so every agent in every repo accessed team-level conventions and domain decisions before writing a line. These are not model choices. They are knowledge governance choices.

KPMG's Q1 2026 AI Pulse research found 65% of enterprise leaders cite difficulty scaling AI use cases as their primary ROI barrier — up from 33% the prior quarter — while data readiness has emerged as the top deployment challenge for technology leaders. The perception gap is closing: the constraint is the knowledge layer, not the reasoning layer.

57% of executives now expect people to manage and direct AI agents, shifting the human role from coder to knowledge curator and validator. But that shift requires something to curate. Only 1 in 5 companies has a mature governance model for autonomous AI agents, per Deloitte's 2026 State of AI report — meaning four in five are deploying agents on knowledge foundations they have never audited.

That is not a model problem. It is knowledge debt.

The fix follows a clear sequence: surface conflicts and gaps in your systems of record before agents consume them; remediate stale or contradictory content automatically; unify everything into a single queryable layer every agent reads from — whether it's pulling from Salesforce, ServiceNow, Zendesk, or Confluence; then monitor continuously as the organization evolves. Human Delta's knowledge infrastructure delivers exactly that — an audit-to-remediation pipeline requiring no code changes and returning results in under 24 hours. In documented enterprise scans, teams surface thousands of issues in the first pass — issues that existed in systems of record long before any agent went live. They just weren't visible.

Affirm came out of Retooling Week with a clear list of problems and the organizational will to fix them — launching a dedicated program focused on context, enablement, and validation. Better inputs before the agent starts. If architectural decisions, domain context, and team conventions are stored where agents can find them, output quality rises and review burden falls.

The model was never the problem. It was always the knowledge going in.

Common questions5

METR's randomized controlled trial found AI tools increased task completion time by 19% for experienced developers — primarily because mature codebases carry enormous implicit context, conventions, and architectural decisions that agents cannot access without structured documentation.

Tribal knowledge is institutional context — team decisions, domain conventions, architectural rationale, undocumented rules — that exists in engineers' heads but is not structured in a format agents can consume. Without it, agents produce plausible-looking output that violates real constraints.

Fine-tuning helps at the margins but doesn't solve the underlying problem. Knowledge bases decay, develop conflicts, and accumulate coverage gaps continuously. A fine-tuned model trained on stale or contradictory data inherits those problems. The durable fix is a continuously maintained, validated knowledge layer that agents retrieve from at inference time.

DRAM prices surged approximately 90% in Q1 2026. Every inference call — including every agentic reasoning step and RAG retrieval — runs on this more expensive compute. Agents that hallucinate because of stale knowledge waste that infrastructure on bad outputs at an increasingly high cost per token.

Human Delta completes most enterprise knowledge scans in under 24 hours with no code changes required. In documented enterprise deployments, teams have surfaced thousands of issues in a single first scan — issues that existed in systems of record long before any agent went live.