Field Notes · Real-Time Reasoning

The Two-Track Mind

For three years, AI got smarter by thinking longer. A new architecture asks a harder question: how do you reason carefully when the world refuses to wait?

June 19, 2026 · Lisa Pedrosa · 10 min read Intelligence
SLOW · PLAN FAST · REACT ACT NOW WORLD CHANGES

Imagine a chess clock that never stops, only the board is a kitchen, a freeway, a trading floor — and every second you spend thinking, the position changes underneath you. This is the situation almost every useful machine actually faces, and until recently our smartest models were terrible at it. They could ponder beautifully, given world enough and time. They simply did not have the time.

The last three years of artificial intelligence were a celebration of slowness. The breakthrough that powered reasoning models — the ones that solve olympiad mathematics and debug sprawling codebases — was the discovery that a language model gets dramatically smarter if you let it think out loud before answering. Researchers called it test-time compute: spend more seconds, more tokens, more deliberation, and watch accuracy climb. It was a genuine revelation. But it carried a quiet assumption that almost nobody examined, because in a chatbot it is always true. The assumption was that the world holds still while you think.

A paper posted this spring punctures that assumption, and in doing so names a problem the field had been stepping around. The work, from a collaboration spanning Tsinghua University, Shanghai Jiao Tong University, Georgia Tech, and Stanford, introduces an architecture called AgileThinker and a deceptively simple idea to go with it: real-time reasoning, the predicament of an agent that must keep acting correctly even as the environment shifts mid-thought. It is the difference between solving a maze on paper and running through one while the walls move.

The tyranny of the single track

To understand why this is hard, picture the two obvious strategies, each of which fails in its own way. The first is to be purely reactive: respond instantly to whatever you see, no deliberation, pure reflex. A reactive agent never falls behind the clock, but it is shallow. It cannot plan three steps ahead, cannot reason about consequences, cannot hold a goal in mind through a detour. It is all instinct and no judgment.

The second strategy is to be a planner: stop, model the situation, reason carefully toward the optimal move. The planner is wise but slow, and slowness in a moving world is its own kind of error. By the time it finishes computing the perfect response to the situation it saw, that situation is gone. The plan is correct and useless — a perfect answer to a question nobody is asking anymore.

What the AgileThinker team built to measure this is as interesting as the model itself: a testing ground they call the Real-Time Reasoning Gym, a set of environments where the experimenter can turn a single dial — time pressure — and watch agents break. At low pressure, with seconds to spare, the slow planners win handily; careful thought pays off. As the dial turns up, something stark happens. The planners' accuracy collapses, not because they got dumber but because they ran out of clock. The reactive agents hold their timing but plateau at mediocrity. Every single-paradigm agent, the researchers found, eventually fails to be both correct and on time. There is no setting of the dial where one strategy wins everywhere.

2
parallel threads running at once — one reacting, one planning
4
institutions behind the work: Tsinghua, SJTU, Georgia Tech, Stanford
1
approach that stayed both accurate and on time as pressure rose
~250 yrs
since the idea of fast vs. slow cognition entered Western thought

Borrowing the brain's oldest trick

The solution AgileThinker reaches for is not new to psychology — it is new to machines. Instead of choosing between reacting and planning, it runs both at the same time, on two parallel threads. A fast reactive track generates a usable action on every tick of the clock, so the agent is never caught empty-handed. A slow planning track runs continuously in the background, uninterrupted, doing the deep work — and when it produces a better idea, that idea is folded into the fast track's behavior. The agent always has an answer ready, and the answer keeps getting smarter as long as time allows.

Anyone who has read Daniel Kahneman will recognize the architecture immediately. In Thinking, Fast and Slow, Kahneman popularized the distinction between System 1 — automatic, instant, effortless — and System 2, the deliberate, sequential, effortful reasoning we do when we actually concentrate. Humans do not pick one. We run System 1 constantly and recruit System 2 when the stakes or the difficulty demand it, and the two are woven together so seamlessly that we rarely notice the seam. AgileThinker is, in effect, an engineering attempt to reproduce that weave.

The breakthrough was not a faster thinker. It was the refusal to choose between thinking fast and thinking well.
— On the AgileThinker architecture

What makes this moment notable is that the same idea is surfacing everywhere at once, which is usually how you can tell a field has found a real problem. Over the past year a string of papers — the Talker-Reasoner architecture, dual-system frameworks for robotic manipulation, "thinking fast and slow" decision-makers — have all converged on the same shape: a quick reflexive module paired with a slower deliberative one. The convergence is not coincidence. It is what happens when researchers chasing different applications all run into the same wall.

Why this matters most where AI touches the ground

That wall is most visible in the part of AI that has to live in the physical world. A chatbot can afford to think for thirty seconds; a robot arm reaching for a falling glass cannot. This is precisely the architecture now spreading through robotics under the banner of vision-language-action models — systems where a slow, language-grounded module plays the role of System 2, surveying the scene and forming intentions, while a fast controller, often a diffusion model or a lightweight transformer, plays System 1 and translates intention into the split-second motor commands that keep a robot from tipping over.

The companies racing to put humanoid robots on factory floors have, in different words, all arrived at the dual-system answer. A robot that only reacts is unsafe and clumsy; a robot that only plans is frozen and useless. The one that might actually work is the one that does both, simultaneously, with the planning track quietly upgrading the reflexes. AgileThinker's contribution is to take that architectural intuition, which had been folk wisdom in robotics labs, and give it a rigorous testbed and a name.

The deepest implication is uncomfortable: for the past three years we measured intelligence by how long a model could afford to think. Real-time reasoning measures it by how well a model performs when it cannot afford to think at all — and those are not the same skill.

The benchmark that bends

The Real-Time Reasoning Gym deserves a moment of its own, because benchmarks shape what a field considers progress. For years the headline numbers in AI have been static-test scores: accuracy on a fixed set of problems, with no clock. Those benchmarks reward exactly the slow, deliberate reasoning that test-time compute supplies, which is part of why the field leaned so hard in that direction. A benchmark with an adjustable time dial changes the incentive. Suddenly an architecture is not graded on whether it can reach the right answer, but on whether it can reach a good-enough answer before the answer expires — and then improve it if the clock allows.

PERFORMANCE AS THE CLOCK SPEEDS UP ACCURACY LOW TIME PRESSURE plan react dual HIGH TIME PRESSURE plan react dual
Illustrative pattern from the Real-Time Reasoning Gym: the planner crumbles as pressure rises, the reactive agent stays flat, and only the dual-track agent stays both accurate and on time.

It is worth being honest about the limits. AgileThinker is a research result, not a shipped product, and its gyms are simulated environments with a tunable clock rather than the messy, unbounded chaos of the real world. Running two reasoning tracks in parallel costs compute, and the question of how to arbitrate between them — when to trust the reflex and when to wait for the plan — is far from settled. The paper is best read not as a finished answer but as a clean statement of a problem the field will be wrestling with for years.

A plan that arrives after the moment it was made for is not wisdom. It is just a very well-reasoned regret.
— On reasoning under time pressure

The clock was always running

There is a tidy historical irony here. Early artificial intelligence, back in the 1980s, was obsessed with exactly this problem under the name "reactive planning" — researchers like the ones who wrote the foundational AAAI papers of that era worried constantly about agents that had to act in real time. Then deep learning arrived, the benchmarks went static, and for a couple of decades the field could largely forget that the world moves. The chatbot era let us pretend the clock had stopped. AgileThinker is, in a sense, the rediscovery of an old anxiety with new tools powerful enough to do something about it.

What comes next is the harder engineering: making the two tracks share memory efficiently, letting the slow track interrupt itself gracefully when the world changes too much to salvage its current plan, and pushing all of it onto hardware small and fast enough to sit inside a robot rather than a data center. If that work succeeds, the machines that result will feel different to interact with — less like a brilliant friend who needs a long pause to answer, more like a competent one who responds immediately and gets sharper the longer you talk. We have spent three years teaching AI to think. The next chapter is teaching it to think and keep moving at the same time, which, as any human who has ever made a decision under pressure knows, is the only kind of thinking that the real world ever actually rewards.

Sources

  1. Zhang et al., "Real-Time Reasoning Agents in Evolving Environments" (AgileThinker), arXiv:2511.04898. arxiv.org/abs/2511.04898
  2. AgileThinker, full HTML preprint. arxiv.org/html/2511.04898v1
  3. "10 AI Research Breakthroughs That Matter for Builders (June 2026)," Build This Now. buildthisnow.com
  4. Kahneman, D. Thinking, Fast and Slow (2011) — System 1 / System 2 framework.
  5. "Agents Thinking Fast and Slow: A Talker-Reasoner Architecture," OpenReview. openreview.net
  6. "DSADF: Thinking Fast and Slow for Decision Making," arXiv:2505.08189. arxiv.org/pdf/2505.08189
  7. "Towards Synergistic, Generalized and Efficient Dual-System for Robotic Manipulation," arXiv:2410.08001. arxiv.org/html/2410.08001v1
  8. "Robots Thinking Fast and Slow: On Dual Process," Oxford ORA. ora.ox.ac.uk
  9. "ProAct: A Dual-System Framework for Proactive Embodied Social Agents," arXiv:2602.14048. arxiv.org/pdf/2602.14048
  10. "Vision Language Action Models (VLA) & Policies for Robots," LearnOpenCV. learnopencv.com
  11. "Efficient Agentic Reasoning Through Self-Regulated Simulative Planning," arXiv:2605.22138. arxiv.org/abs/2605.22138
  12. "Reactive Reasoning and Planning," AAAI (1987) — historical foundation. aaai.org
  13. "6 AI breakthroughs that will define 2026," InfoWorld. infoworld.com
Ko-fi Buy me a coffee
Scroll to Top