Test-Time Compute Scaling

The new scaling paradigm: when thinking longer beats training bigger.

For years, AI progress followed a simple law: train bigger models, get better performance. Pour more compute into pretraining, and capabilities scaled predictably. Then OpenAI discovered something that changes everything: you can also scale by thinking harder at inference time.

This is test-time compute scaling—the realization that extending the reasoning process when the model is actually being used can produce dramatic improvements in capability, sometimes rivaling what would have required orders of magnitude more pretraining compute.

The o1 model that shocked researchers in 2024 didn’t just answer questions. It thought about them—spending computational resources on chain-of-thought reasoning, self-correction, and tree search before settling on answers. And the results were extraordinary.

Why This Matters for Coherence

Coherence isn’t instantaneous. It emerges through process: exploring possibilities, checking consistency, refining understanding, and integrating evidence. Test-time compute scaling formalizes this intuition: intelligence isn’t just about what you know, but about how thoroughly you think through what you’re trying to figure out.

Understanding inference-time scaling helps us understand what thinking looks like when formalized as computational process—and what it means for systems to maintain coherence through extended reasoning.

What This Series Covers

This series explores test-time compute scaling and its implications for understanding intelligence, reasoning, and the future of AI. We’ll examine:

How OpenAI discovered that inference scales like training
The mechanics of extended reasoning: tree search, self-refinement, verification
When to invest in training versus thinking
How language models implement reasoning through tree search
Self-correction and iterative refinement
Business model implications of metered intelligence
Connections between inference scaling and active inference
What test-time compute teaches us about the nature of thinking

By the end of this series, you’ll understand why the question “How do you make AI smarter?” has two answers—train bigger or think longer—and why the second answer might be more important than the first.

Articles in This Series

Part of the FRONTIER SCIENCE collection. For more on how reasoning works, see Active Inference Applied and Mechanistic Interpretability.

Why This Matters for Coherence

What This Series Covers

Articles in This Series

Comments ( )

Comments ()