LLMs Can Now Think, Read, and Act at the Same Time

Today's AI agents are bottlenecked by a fundamental architectural assumption: one stream of computation at a time. A model reads, then thinks, then writes. It cannot react to new information while generating a response. It cannot think while reading. Every action blocks every other action. For product engineers building coding agents, computer-use systems, or any tool-calling pipeline, this is not an abstract constraint. It shapes latency, responsiveness, and how much you can actually parallelize.

A new preprint from Su, Yang, Li, and Geiping proposes a direct fix. The core idea is called Multi-Stream LLMs. Instead of instruction-tuning models to handle a single sequential message format, you instruction-tune them to operate across multiple parallel streams of computation. Each role (user, system, chain-of-thought, tool) gets its own stream. Every forward pass then reads from multiple input streams and generates tokens across multiple output streams simultaneously. All streams remain causally dependent on earlier timesteps, so the model stays coherent.

This is a training-time change, not an inference trick. The authors frame it as a data-driven approach: swap the format used during instruction-tuning, and the model learns to operate in parallel.

The practical payoff the paper describes is significant. Models can act while thinking, think while reading, and react to new input while still generating output. These capabilities are currently impossible in the chat-style message exchange format that has persisted from early instruction-tuned models through today's advanced agents.

Beyond responsiveness, the authors argue the design improves efficiency through parallelization, improves security through better separation of concerns between streams, and can improve monitorability since the internal thought stream is structurally separated from the output stream. That last point matters for teams building systems that need to audit or filter model reasoning independently from what the model actually says to users.

The paper is a preprint at 37 pages, with code available via the linked repository. No specific benchmark numbers are cited in the abstract, so treat the capability claims as directional until you read the full evaluation.

What should you do with this today? If you are building an agent architecture and feel the friction of sequential message loops, this paper is worth a careful read. The multi-stream framing gives a concrete vocabulary for a real pain point. If you are investing in custom fine-tuning pipelines, the instruction-tuning angle is the most actionable part: the authors are essentially saying you can teach a model to think in parallel by changing your training data format, not by changing the model weights from scratch.