Lfm2ForCausalLM, Lfm2MoeForCausalLM, and Lfm2VlForConditionalGeneration), so there’s no --trust-remote-code and nothing to patch.
The vLLM cookbook keeps the serve flags, sampling presets, and tuning notes current. This page gets you from install to first request.
LFM2.5 vLLM Cookbook
Serving commands, sampling settings, and hardware notes for LFM2.5 on vLLM, verified on real hardware and updated with each release.
What’s in the cookbook
- Commands for every model. The full LFM2.5 matrix (dense, MoE, reasoning, Japanese, base, and vision), with the exact command for each.
- Reasoning and tool calling. When to pass
--reasoning-parser qwen3and--tool-call-parser lfm2, and how the parsed output comes back. - Recommended sampling. Per-checkpoint presets from the model cards.
- Vision. Serving the VL models and sending image-and-text turns.
- Blackwell tuning and benchmarks. Throughput-impacting flags, with benchmark numbers.
- One-command Modal deploy. A ready-to-run script for cloud GPUs.
Quickstart
Install vLLM. LFM2.5’s dense, MoE, and VL architectures ship in v0.23.0 and later:This is the minimal path. The per-model and per-hardware flags (
--tool-call-parser lfm2, --reasoning-parser qwen3, vision options, and Blackwell tuning) live in the cookbook, where they stay in sync with each release.Sampling parameters
LFM2.5 ships per-model sampling presets on its Hugging Face model cards. The cookbook’s sampling table has the exact values for every checkpoint. Two details matter:top_k, min_p, and repetition_penalty are vLLM extras, so pass them through extra_body rather than as top-level arguments; and leave max_tokens unset unless you mean to cap output, since a low cap cuts off the reasoning models mid-thought.