> ## Documentation Index
> Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# LFM2.5-350M

> Ultra-compact 350M parameter model for edge devices and low latency deployments

export const TextLlamacpp = ({ggufRepo, samplingFlags}) => <div>
<p><strong>Install:</strong></p>
<pre className="shiki shiki-themes github-light github-dark" style={{
  backgroundColor: '#fff',
  '--shiki-dark-bg': '#24292e',
  color: '#24292e',
  '--shiki-dark': '#e1e4e8'
}} language="bash">
<code language="bash">
{`brew install llama.cpp`.split('\n').map((line, i) => <span key={i} className="line">{line}{'\n'}</span>)}
</code>
</pre>
<p><strong>Run:</strong></p>
<pre className="shiki shiki-themes github-light github-dark" style={{
  backgroundColor: '#fff',
  '--shiki-dark-bg': '#24292e',
  color: '#24292e',
  '--shiki-dark': '#e1e4e8'
}} language="bash">
<code language="bash">
{`llama-cli -hf ${ggufRepo} -c 4096 --color -i \\
    ${samplingFlags}`.split('\n').map((line, i) => <span key={i} className="line">{line}{'\n'}</span>)}
</code>
</pre>
<p>The <code>-hf</code> flag downloads the model directly from Hugging Face. For other installation methods and advanced usage, see the <a href="/docs/inference/llama-cpp">llama.cpp guide</a>.</p>
</div>;

export const TextSglang = ({modelId, toolCallParser, samplingParams}) => <div>
<p><strong>Install:</strong></p>
<pre className="shiki shiki-themes github-light github-dark" style={{
  backgroundColor: '#fff',
  '--shiki-dark-bg': '#24292e',
  color: '#24292e',
  '--shiki-dark': '#e1e4e8'
}} language="bash">
<code language="bash">
{`uv pip install "sglang>=0.5.10"`.split('\n').map((line, i) => <span key={i} className="line">{line}{'\n'}</span>)}
</code>
</pre>
<p><strong>Launch server:</strong></p>
<pre className="shiki shiki-themes github-light github-dark" style={{
  backgroundColor: '#fff',
  '--shiki-dark-bg': '#24292e',
  color: '#24292e',
  '--shiki-dark': '#e1e4e8'
}} language="bash">
<code language="bash">
{(toolCallParser ? `sglang serve \\
    --model-path ${modelId} \\
    --host 0.0.0.0 \\
    --port 30000 \\
    --tool-call-parser ${toolCallParser}` : `sglang serve \\
    --model-path ${modelId} \\
    --host 0.0.0.0 \\
    --port 30000`).split('\n').map((line, i) => <span key={i} className="line">{line}{'\n'}</span>)}
</code>
</pre>
<p><strong>Query (OpenAI-compatible):</strong></p>
<pre className="shiki shiki-themes github-light github-dark" style={{
  backgroundColor: '#fff',
  '--shiki-dark-bg': '#24292e',
  color: '#24292e',
  '--shiki-dark': '#e1e4e8'
}} language="python">
<code language="python">
{`from openai import OpenAI

client = OpenAI(base_url="http://localhost:30000/v1", api_key="None")

response = client.chat.completions.create(
    model="${modelId}",
    messages=[{"role": "user", "content": "What is machine learning?"}],
    ${samplingParams || "temperature=0.3,"}
)

print(response.choices[0].message.content)`.split('\n').map((line, i) => <span key={i} className="line">{line}{'\n'}</span>)}
</code>
</pre>
</div>;

export const TextVllm = ({modelId, samplingParams, maxTokens}) => <div>
<p><strong>Install:</strong></p>
<pre className="shiki shiki-themes github-light github-dark" style={{
  backgroundColor: '#fff',
  '--shiki-dark-bg': '#24292e',
  color: '#24292e',
  '--shiki-dark': '#e1e4e8'
}} language="bash">
<code language="bash">
{`pip install vllm==0.14`.split('\n').map((line, i) => <span key={i} className="line">{line}{'\n'}</span>)}
</code>
</pre>
<p><strong>Run:</strong></p>
<pre className="shiki shiki-themes github-light github-dark" style={{
  backgroundColor: '#fff',
  '--shiki-dark-bg': '#24292e',
  color: '#24292e',
  '--shiki-dark': '#e1e4e8'
}} language="python">
<code language="python">
{`from vllm import LLM, SamplingParams

llm = LLM(model="${modelId}")

sampling_params = SamplingParams(${samplingParams}max_tokens=${maxTokens || 512})

output = llm.chat("What is machine learning?", sampling_params)
print(output[0].outputs[0].text)`.split('\n').map((line, i) => <span key={i} className="line">{line}{'\n'}</span>)}
</code>
</pre>
</div>;

export const TextTransformers = ({modelId, samplingParams}) => <div>
<p><strong>Install:</strong></p>
<pre className="shiki shiki-themes github-light github-dark" style={{
  backgroundColor: '#fff',
  '--shiki-dark-bg': '#24292e',
  color: '#24292e',
  '--shiki-dark': '#e1e4e8'
}} language="bash">
<code language="bash">
{`pip install "transformers>=5.2.0" torch accelerate`.split('\n').map((line, i) => <span key={i} className="line">{line}{'\n'}</span>)}
</code>
</pre>
<p><strong>Download & Run:</strong></p>
<pre className="shiki shiki-themes github-light github-dark" style={{
  backgroundColor: '#fff',
  '--shiki-dark-bg': '#24292e',
  color: '#24292e',
  '--shiki-dark': '#e1e4e8'
}} language="python">
<code language="python">
{`from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "${modelId}"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

inputs = tokenizer.apply_chat_template(
    [{"role": "user", "content": "What is machine learning?"}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
    return_dict=True,
).to(model.device)

output = model.generate(**inputs, ${samplingParams}max_new_tokens=512)
input_length = inputs["input_ids"].shape[1]
response = tokenizer.decode(output[0][input_length:], skip_special_tokens=True)
print(response)`.split('\n').map((line, i) => <span key={i} className="line">{line}{'\n'}</span>)}
</code>
</pre>
</div>;

<a href="/lfm/models/text-models" className="back-button">← Back to Text Models</a>

LFM2.5-350M is Liquid AI's smallest LFM2.5 text model, designed for edge devices with strict memory and compute constraints. Built on the LFM2.5 architecture with extended pre-training and reinforcement learning, it delivers improved chat, instruction-following, and tool-calling performance over LFM2-350M while keeping the same compact footprint.

<div style={{display: 'flex', gap: '0.5rem', margin: '0.5rem 0 1.5rem 0'}}>
  <a href="https://huggingface.co/LiquidAI/LFM2.5-350M" style={{padding: '0.35rem 0.7rem', borderRadius: '4px', fontSize: '0.85rem', fontWeight: 600, textDecoration: 'none', backgroundColor: '#fbbf24'}}><span style={{color: '#000'}}>HF</span></a>
  <a href="https://huggingface.co/LiquidAI/LFM2.5-350M-GGUF" style={{padding: '0.35rem 0.7rem', borderRadius: '4px', fontSize: '0.85rem', fontWeight: 600, textDecoration: 'none', backgroundColor: '#60a5fa'}}><span style={{color: '#000'}}>GGUF</span></a>
  <a href="https://huggingface.co/LiquidAI/LFM2.5-350M-MLX-8bit" style={{padding: '0.35rem 0.7rem', borderRadius: '4px', fontSize: '0.85rem', fontWeight: 600, textDecoration: 'none', backgroundColor: '#c4b5fd'}}><span style={{color: '#000'}}>MLX</span></a>
  <a href="https://huggingface.co/LiquidAI/LFM2.5-350M-ONNX" style={{padding: '0.35rem 0.7rem', borderRadius: '4px', fontSize: '0.85rem', fontWeight: 600, textDecoration: 'none', backgroundColor: '#86efac'}}><span style={{color: '#000'}}>ONNX</span></a>
</div>

## Specifications

| Property       | Value          |
| -------------- | -------------- |
| Parameters     | 350M           |
| Context Length | 32K tokens     |
| Architecture   | LFM2.5 (Dense) |

<div className="use-cases">
  <CardGroup cols={3}>
    <Card title="Ultra-Light" icon="feather">
      Minimal memory and compute footprint
    </Card>

    <Card title="Tool Calling" icon="wrench">
      Native function calling support
    </Card>

    <Card title="Edge-Ready" icon="microchip">
      Runs on mobile and embedded devices
    </Card>
  </CardGroup>
</div>

## Quick Start

<Tabs>
  <Tab title="Transformers">
    <TextTransformers modelId="LiquidAI/LFM2.5-350M" samplingParams="do_sample=True, temperature=0.1, top_k=50, repetition_penalty=1.05, " />
  </Tab>

  <Tab title="llama.cpp">
    <TextLlamacpp ggufRepo="LiquidAI/LFM2.5-350M-GGUF" samplingFlags="--temp 0.1 --top-k 50 --repeat-penalty 1.05" />
  </Tab>

  <Tab title="vLLM">
    <TextVllm modelId="LiquidAI/LFM2.5-350M" samplingParams="temperature=0.1, top_k=50, repetition_penalty=1.05, " />
  </Tab>

  <Tab title="SGLang">
    <TextSglang modelId="LiquidAI/LFM2.5-350M" toolCallParser="lfm2" />
  </Tab>
</Tabs>
