StartupENGINEERING
INSIGHTAI CAPABILITY · STRATEGICPUBLISHED · Apr 2026

The collapsing time horizon of code

Read original: Anthropic capability-curve slide, 2026

Decision this supports

How should we plan capability roadmaps against the observed agent time-horizon trend?

THE SOURCE

What was observed

Anthropic's capability curve compresses a year of engineering into hours. If your throughput is still measured in story points and sprints, you're benchmarking against the wrong baseline.

The capability curve

Anthropic timeline: AI coding capability from autocomplete (2021) to agents tackling projects (2026).
Source: Anthropic capability-curve slide, 2026. Reproduced for commentary.
What changed at each rung
2025

Agents complete tasks

Claude Code, multi-file PRs

AI works for
hours

An agent now plans, reads across the repo, edits dozens of files, runs tests, and arrives with a reviewable PR. The engineer's job shifts from author to reviewer-in-the-loop.

Stated trajectory

Anthropic frames the curve as a roughly logarithmic compression of engineering work: 2021 → 2026 moves from autocomplete (seconds of work) to file-edits, to tasks, to multi-hour project agents. Autocomplete → file edits → tasks → projects in five years.

Read original: Anthropic capability-curve slide, 2026

MY EVALUATION

Verdict

Context-dependent. Each rung up this curve isn't an incremental productivity win — it changes what an engineer is for, what a team looks like, and what is worth building. The signal is real, the slope is uncertain, the second-order effects on team shape are the part worth planning against.

Directional read

Signal credibility4/5
Slope confidence2/5
Startup decision-relevance5/5
Time-to-act4/5

Conditions for adoption

  • Plan against this signal when:your product roadmap is bottlenecked by engineering capacity rather than market discovery — the curve's throughput claim moves the constraint.
  • Plan selectively when: domain has tight ground truth (compilers, infra, data plumbing) — capability gains land first; price the productivity in before hiring.
  • Discount when: domain is product judgement, novel UX, or regulated decisions — slope is real but slower; treat as a 24-month prior, not a quarter-by-quarter signal.
  • Skip when: the team is sub-3 engineers and pre-PMF — capability planning is a distraction from finding the market.

Implications for startup decisions

  1. Headcount-driven roadmap planning becomes obsolete the moment one engineer can ship a 30-hour project per day. Re-plan around throughput, not capacity.
  2. Code review, test coverage, and architectural taste become the scarce resource — not the act of typing code. Invest in evaluation infra (rubrics, golden datasets, CI gates) before scaling agent usage.
  3. Surface area of "things worth attempting" expands. Side experiments, internal tools, and one-off automations now have positive ROI. Maintain a parking lot, not a backlog.
  4. Senior engineers move from authors to editors. Hire and promote on judgement, system design, and review velocity rather than raw output.

Named uncertainty

  • The curve is a vendor self-portrait. Independent replication on enterprise codebases is thin — treat the slope as a planning prior, not a forecast.
  • "30-hour project" is throughput-shaped, not accuracy-shaped. The bottleneck moves to review, not authorship; the curve does not measure that move.
  • Domains with tight ground truth (compilers, infra) move up the curve faster than those without (product judgement, novel UX).

Author's take · Selva, 2026-04-27

Pick one in-flight project. Re-estimate it as if a 30-hour agent were a teammate. Where does the bottleneck move? That's where your next investment goes.

Personal opinion, not analysis. Dated above; revisit if the conditions change.

Conflict of interest: none.