> ## Documentation Index
> Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# SGLang

> Serve LFM2.5 with SGLang, a low-latency OpenAI-compatible serving framework. The full recipe, verified on real hardware, lives in the SGLang cookbook.

SGLang serves LFM2.5 at low latency under high concurrency and exposes an OpenAI-compatible API. The dense, MoE, and vision-language models are supported natively, along with the `lfm2` tool-call parser and `<think>` reasoning, so a single `sglang serve` puts any LFM2.5 checkpoint behind a production endpoint.

<Tip>
  Reach for SGLang when you want the lowest latency at high concurrency. It requires a CUDA GPU. On CPU-only machines, use [llama.cpp](/deployment/on-device/llama-cpp) instead.
</Tip>

The SGLang cookbook generates the launch command for each model and GPU, including parsers and Blackwell attention backends. This page gets a server running.

<Card title="LFM2.5 SGLang Cookbook" icon="book-open" href="https://docs.sglang.ai/cookbook/autoregressive/LiquidAI/LFM2.5">
  An interactive command generator for LFM2.5 on SGLang, verified on real hardware and updated with each release.
</Card>

## What's in the cookbook

* **A command for every model and GPU.** An interactive generator for the verified `sglang serve` line for your hardware.
* **Reasoning and tool calling.** The right `--reasoning-parser` per model and `--tool-call-parser lfm2`, included in each command.
* **Blackwell attention backends.** The `--attention-backend` and `--mm-attention-backend` choices that matter on B200/B300.
* **Vision tuning.** Backends and memory flags for the VL models, with throughput numbers.
* **Recommended sampling.** Per-checkpoint presets from the model cards.
* **Benchmarks.** Measured TTFT, TPOT, and throughput per model and GPU.

## Quickstart

Install SGLang from the [official guide](https://docs.sglang.io/get_started/install_sglang.html):

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
pip install --upgrade pip
uv pip install sglang
```

Recent SGLang releases and the `lmsysorg/sglang` dev image include LFM2.5 support: the dense / MoE / VL model classes and the `lfm2` tool-call parser. On an older release, install from source or use the dev image. The cookbook lists the minimum version.

Serve an LFM2.5 model behind an OpenAI-compatible API:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
sglang serve --model-path LiquidAI/LFM2.5-1.2B-Instruct --tool-call-parser lfm2
```

<Note>
  This covers the dense Instruct model. The right `--reasoning-parser` for the thinking models, the vision and Blackwell attention backends, and the verified command for every model and GPU are generated by the [cookbook](https://docs.sglang.ai/cookbook/autoregressive/LiquidAI/LFM2.5). Use it for flags that change across releases.
</Note>

Send your first request with the OpenAI client:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from openai import OpenAI

client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY")

response = client.chat.completions.create(
    model="LiquidAI/LFM2.5-1.2B-Instruct",
    messages=[{"role": "user", "content": "What is C. elegans? Answer in one sentence."}],
    temperature=0.1,
    extra_body={"top_k": 50, "repetition_penalty": 1.05},
)
print(response.choices[0].message.content)
```

## Sampling parameters

LFM2.5 uses per-model sampling presets. The [cookbook's configuration tips](https://docs.sglang.ai/cookbook/autoregressive/LiquidAI/LFM2.5#2-configuration-tips) list the exact values per checkpoint. Pass them on every request, since some checkpoints ship no defaults in `generation_config.json` and the server won't apply them for you. Put `top_k`, `min_p`, and `repetition_penalty` in `extra_body`, and leave `max_tokens` unset unless you mean to cap output.
