Ollama 0.24 ships support for the Codex App, a desktop coding environment with built-in browser annotation, code review, and parallel worktree support. Developers can run it locally with a single command.
May 29, 2026
Big infrastructure news lands alongside a real reliability warning today. Anthropic closed a $65B raise at a near-trillion valuation while shipping Claude Opus 4.8 at three times lower fast-mode costs, and Claude Code now spins up hundreds of parallel subagents automatically. On the reliability side, a Lenz Research study found five frontier LLMs disagree on 67% of real fact-checks, which matters if your pipeline routes verification work to a single model without a fallback strategy.
Ollama 0.24 ships support for the Codex App, a desktop coding environment with built-in browser annotation, code review, and parallel worktree support. Developers can run it locally with a single command.
A little-known model called Hy3 preview from Tencent is now leading OpenRouter's token usage rankings by more than 50% over Claude, despite benchmark results that don't match its popularity. The explanation isn't model quality, and that should concern builders watching their API costs.
A developer noticed that LLM-assisted writing and AI-generated UI share recognizable fingerprints that have spread across the internet. If your product uses AI to generate content or interfaces, your users are starting to notice.
Anthropic has released Claude Opus 4.8 with benchmark improvements across coding, reasoning, and agentic tasks. Fast mode is now three times cheaper than on previous models, and new features ship alongside the upgrade.
Anthropic closed a $65B Series H at a $965B post-money valuation, with run-rate revenue crossing $47B. Here is what the compute and partnership expansion means for teams building on Claude today.
Anthropic just shipped dynamic workflows in Claude Code, letting it write its own orchestration scripts and run tens to hundreds of parallel subagents in a single session. Work that used to take quarters can now finish in days.
A Lenz Research study put 1,000 real user claims to five frontier LLMs and found the panel splits on 67% of them. For builders routing fact-checking workloads to a single model, that number is a red flag.