Liquid AI released LFM2.5-8B-A1B today, and the delta from its predecessor matters if you are building agentic features that need to run locally.
The previous model, LFM2-8B-A1B, shipped in October 2025 with 12T tokens of pretraining and a limited context window. LFM2.5-8B-A1B expands that context window to 128K tokens, scales pretraining to 38T tokens, and adds large-scale reinforcement learning on top. Liquid also doubled the vocabulary size to improve tokenization efficiency for non-Latin languages. That last change is meaningful if your users are not writing in English or other Latin-script languages.
The core design goal is chaining tool calls reliably on consumer hardware. This is not a cloud-first model that happens to have a quantized version. It is built from the ground up for edge inference. Liquid claims it is the fastest in its size class on both CPU and GPU inference.
Day-one support lands for llama.cpp, MLX, vLLM, and SGLang. That covers the main local inference paths developers actually use. You do not need to wait for community ports.
On the eval side, Liquid points to the AA-Omniscience Index, which rewards correct answers and penalizes hallucinations on a scale of negative 100 to 100. They position LFM2.5-8B-A1B as competitive with much larger dense and MoE models on instruction following and agentic tasks. More detailed scores are available on Artificial Analysis.
Both the base model (LFM2.5-8B-A1B-Base) and the post-trained model (LFM2.5-8B-A1B) are on Hugging Face now. Liquid also published docs covering how to run and fine-tune them locally.
If you are shipping a product that requires an on-device personal assistant, a local agentic loop, or tool-use features that cannot depend on a network call, this is worth a direct evaluation today. Pull the model from Hugging Face, run it through your tool-calling benchmark on target hardware, and check whether the 128K context window covers your longest prompts. The fine-tuning docs are already available if you need domain adaptation.