Ollama 0.24 ships support for the Codex App, a desktop experience for running parallel coding threads with built-in worktree and git support. Builders can now annotate live local servers, review code, and leave comments without leaving the app.
June 1, 2026
A couple of notable infrastructure moves today. vLLM v0.22 ships a multi-tier KV cache offloading framework alongside a 28.9% latency improvement for batch-invariant inference via Cutlass FP8 support, plus an experimental Rust frontend worth watching. On the local side, Ollama 0.24 brings the Codex desktop app to local inference, adding parallel coding threads with built-in worktree and git support. MiniMax M3 also lands on Vercel AI Gateway, offering a 1M-token context window that drops into existing AI SDK workflows with a single model string change.
Ollama 0.24 ships support for the Codex App, a desktop experience for running parallel coding threads with built-in worktree and git support. Builders can now annotate live local servers, review code, and leave comments without leaving the app.
MiniMax M3, a multimodal model with a 1M-token context window, is now accessible through Vercel AI Gateway. Builders can drop it into existing AI SDK workflows with a single model string change.
vLLM v0.22.0 lands major DeepSeek V4 hardening, an experimental Rust frontend, and a new multi-tier KV cache offloading framework. Batch-invariant inference also gets a 28.9% latency improvement via Cutlass FP8 support.