lfm2 tool-call parser and <think> reasoning, so a single sglang serve puts any LFM2.5 checkpoint behind a production endpoint.
The SGLang cookbook generates the launch command for each model and GPU, including parsers and Blackwell attention backends. This page gets a server running.
LFM2.5 SGLang Cookbook
An interactive command generator for LFM2.5 on SGLang, verified on real hardware and updated with each release.
What’s in the cookbook
- A command for every model and GPU. An interactive generator for the verified
sglang serveline for your hardware. - Reasoning and tool calling. The right
--reasoning-parserper model and--tool-call-parser lfm2, included in each command. - Blackwell attention backends. The
--attention-backendand--mm-attention-backendchoices that matter on B200/B300. - Vision tuning. Backends and memory flags for the VL models, with throughput numbers.
- Recommended sampling. Per-checkpoint presets from the model cards.
- Benchmarks. Measured TTFT, TPOT, and throughput per model and GPU.
Quickstart
Install SGLang from the official guide:lmsysorg/sglang dev image include LFM2.5 support: the dense / MoE / VL model classes and the lfm2 tool-call parser. On an older release, install from source or use the dev image. The cookbook lists the minimum version.
Serve an LFM2.5 model behind an OpenAI-compatible API:
This covers the dense Instruct model. The right
--reasoning-parser for the thinking models, the vision and Blackwell attention backends, and the verified command for every model and GPU are generated by the cookbook. Use it for flags that change across releases.Sampling parameters
LFM2.5 uses per-model sampling presets. The cookbook’s configuration tips list the exact values per checkpoint. Pass them on every request, since some checkpoints ship no defaults ingeneration_config.json and the server won’t apply them for you. Put top_k, min_p, and repetition_penalty in extra_body, and leave max_tokens unset unless you mean to cap output.