The collapsing time horizon of code
Read original: Anthropic capability-curve slide, 2026Decision this supports
How should we plan capability roadmaps against the observed agent time-horizon trend?
THE SOURCE
What was observed
Anthropic's capability curve compresses a year of engineering into hours. If your throughput is still measured in story points and sprints, you're benchmarking against the wrong baseline.
The capability curve

Agents complete tasks
Claude Code, multi-file PRs
An agent now plans, reads across the repo, edits dozens of files, runs tests, and arrives with a reviewable PR. The engineer's job shifts from author to reviewer-in-the-loop.
Stated trajectory
Anthropic frames the curve as a roughly logarithmic compression of engineering work: 2021 → 2026 moves from autocomplete (seconds of work) to file-edits, to tasks, to multi-hour project agents. Autocomplete → file edits → tasks → projects in five years.
MY EVALUATION
Verdict
Context-dependent. Each rung up this curve isn't an incremental productivity win — it changes what an engineer is for, what a team looks like, and what is worth building. The signal is real, the slope is uncertain, the second-order effects on team shape are the part worth planning against.
Directional read
Conditions for adoption
- Plan against this signal when:your product roadmap is bottlenecked by engineering capacity rather than market discovery — the curve's throughput claim moves the constraint.
- Plan selectively when: domain has tight ground truth (compilers, infra, data plumbing) — capability gains land first; price the productivity in before hiring.
- Discount when: domain is product judgement, novel UX, or regulated decisions — slope is real but slower; treat as a 24-month prior, not a quarter-by-quarter signal.
- Skip when: the team is sub-3 engineers and pre-PMF — capability planning is a distraction from finding the market.
Implications for startup decisions
- Headcount-driven roadmap planning becomes obsolete the moment one engineer can ship a 30-hour project per day. Re-plan around throughput, not capacity.
- Code review, test coverage, and architectural taste become the scarce resource — not the act of typing code. Invest in evaluation infra (rubrics, golden datasets, CI gates) before scaling agent usage.
- Surface area of "things worth attempting" expands. Side experiments, internal tools, and one-off automations now have positive ROI. Maintain a parking lot, not a backlog.
- Senior engineers move from authors to editors. Hire and promote on judgement, system design, and review velocity rather than raw output.
Named uncertainty
- The curve is a vendor self-portrait. Independent replication on enterprise codebases is thin — treat the slope as a planning prior, not a forecast.
- "30-hour project" is throughput-shaped, not accuracy-shaped. The bottleneck moves to review, not authorship; the curve does not measure that move.
- Domains with tight ground truth (compilers, infra) move up the curve faster than those without (product judgement, novel UX).
Author's take · Selva, 2026-04-27
Pick one in-flight project. Re-estimate it as if a 30-hour agent were a teammate. Where does the bottleneck move? That's where your next investment goes.
Personal opinion, not analysis. Dated above; revisit if the conditions change.
Conflict of interest: none.