← Journal

May 24, 2026

Two releases stand out today. vLLM v0.21.0 ships KV offloading via Hybrid Memory Allocator and speculative decoding support for reasoning budgets, but the mandatory C++20 build requirement will break existing setups, so teams running inference infrastructure need to act before upgrading. On the agent side, Hermes v0.14.0 delivers a local OpenAI-compatible proxy that lets tools like Codex, Aider, and Cline hit Claude Pro or ChatGPT Pro without API keys, and cuts cold-start time by roughly 19 seconds.

research
Better Token Credit Assignment Closes the RLVR Reasoning Gap

A new method called DelTA reshapes how reinforcement learning updates propagate to individual tokens during LLM training, boosting math reasoning scores by over 3 points on top baselines. Engineers building reasoning-focused models now have a concrete technique to reduce noise from high-frequency formatting tokens polluting gradient updates.

framework
Nanobot v0.2 Teaches Agents to Remember Why They Started

Nanobot v0.2.0 ships goal persistence via a new /goal system that keeps an active objective pinned in Runtime Context across compaction and long tool chains. The release also bundles the WebUI inside the pip wheel and refactors the agent loop into a functional state machine with five new model providers.

infra_api
vLLM 0.21 Brings HMA, Spec Decode Budgets, and a Build Break

vLLM v0.21.0 lands KV offloading with Hybrid Memory Allocator integration, speculative decoding support for reasoning budgets, and a C++20 build requirement that will break existing setups. Here is what teams running inference infrastructure need to act on now.

framework
Hermes 0.14 Brings Local OpenAI Proxy, SuperGrok OAuth, and Faster Starts

Hermes Agent v0.14.0 ships a local OpenAI-compatible proxy that lets coding tools like Codex, Aider, and Cline hit Claude Pro or ChatGPT Pro without API keys. The release also cuts cold-start time by roughly 19 seconds and lands xAI Grok via SuperGrok OAuth with a 1M token context window.

infra_api
Firecrawl v2.10 Adds File Parsing, Lockdown Mode, and Smarter Scrape Formats

Firecrawl v2.10 ships a /parse endpoint for local file ingestion, a Lockdown Mode for zero-outbound scraping, and new question and highlights formats that cut token usage by up to 100x. Go, Ruby, and PHP SDKs join the official lineup.