> ## Documentation Index
> Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# vLLM

> Serve LFM2.5 with vLLM, a high-throughput OpenAI-compatible inference engine. The full recipe, verified on real hardware, lives in the vLLM cookbook.

vLLM is the highest-throughput way to serve LFM2.5 in production. Point vLLM at an LFM2.5 checkpoint to start an OpenAI-compatible endpoint with one command. The dense, MoE, and vision-language models are all native to vLLM (`Lfm2ForCausalLM`, `Lfm2MoeForCausalLM`, and `Lfm2VlForConditionalGeneration`), so there's no `--trust-remote-code` and nothing to patch.

<Tip>
  Reach for vLLM when you need high-throughput serving, batch processing, or an OpenAI-compatible API. It requires a CUDA GPU. On CPU-only machines, use [llama.cpp](/deployment/on-device/llama-cpp) instead.
</Tip>

The vLLM cookbook keeps the serve flags, sampling presets, and tuning notes current. This page gets you from install to first request.

<Card title="LFM2.5 vLLM Cookbook" icon="book-open" href="https://docs.vllm.ai/projects/recipes/en/latest/LiquidAI/LFM2.5.html">
  Serving commands, sampling settings, and hardware notes for LFM2.5 on vLLM, verified on real hardware and updated with each release.
</Card>

## What's in the cookbook

* **Commands for every model.** The full LFM2.5 matrix (dense, MoE, reasoning, Japanese, base, and vision), with the exact command for each.
* **Reasoning and tool calling.** When to pass `--reasoning-parser qwen3` and `--tool-call-parser lfm2`, and how the parsed output comes back.
* **Recommended sampling.** Per-checkpoint presets from the model cards.
* **Vision.** Serving the VL models and sending image-and-text turns.
* **Blackwell tuning and benchmarks.** Throughput-impacting flags, with benchmark numbers.
* **One-command Modal deploy.** A ready-to-run script for cloud GPUs.

## Quickstart

Install vLLM. LFM2.5's dense, MoE, and VL architectures ship in **v0.23.0** and later:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
uv pip install -U vllm
```

Serve any LFM2.5 model behind an OpenAI-compatible API:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
vllm serve LiquidAI/LFM2.5-1.2B-Instruct
```

<Note>
  This is the minimal path. The per-model and per-hardware flags (`--tool-call-parser lfm2`, `--reasoning-parser qwen3`, vision options, and Blackwell tuning) live in the [cookbook](https://docs.vllm.ai/projects/recipes/en/latest/LiquidAI/LFM2.5.html), where they stay in sync with each release.
</Note>

Send your first request with the OpenAI client:

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

response = client.chat.completions.create(
    model="LiquidAI/LFM2.5-1.2B-Instruct",
    messages=[{"role": "user", "content": "What is C. elegans? Answer in one sentence."}],
    temperature=0.1,
    extra_body={"top_k": 50, "repetition_penalty": 1.05},
)
print(response.choices[0].message.content)
```

## Sampling parameters

LFM2.5 ships per-model sampling presets on its Hugging Face model cards. The [cookbook's sampling table](https://docs.vllm.ai/projects/recipes/en/latest/LiquidAI/LFM2.5.html#recommended-sampling) has the exact values for every checkpoint. Two details matter: `top_k`, `min_p`, and `repetition_penalty` are vLLM extras, so pass them through `extra_body` rather than as top-level arguments; and leave `max_tokens` unset unless you mean to cap output, since a low cap cuts off the reasoning models mid-thought.
