> ## Documentation Index > Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt > Use this file to discover all available pages before exploring further. # LFM2.5-350M > Ultra-compact 350M parameter model for edge devices and low latency deployments export const TextLlamacpp = ({ggufRepo, samplingFlags}) =>

Install:


{`brew install llama.cpp`.split('\n').map((line, i) => {line}{'\n'})}

Run:


{`llama-cli -hf ${ggufRepo} -c 4096 --color -i \\
    ${samplingFlags}`.split('\n').map((line, i) => {line}{'\n'})}

The -hf flag downloads the model directly from Hugging Face. For other installation methods and advanced usage, see the llama.cpp guide.

; export const TextSglang = ({modelId, toolCallParser, samplingParams}) =>

Install:


{`uv pip install "sglang>=0.5.10"`.split('\n').map((line, i) => {line}{'\n'})}

Launch server:


{(toolCallParser ? `sglang serve \\
    --model-path ${modelId} \\
    --host 0.0.0.0 \\
    --port 30000 \\
    --tool-call-parser ${toolCallParser}` : `sglang serve \\
    --model-path ${modelId} \\
    --host 0.0.0.0 \\
    --port 30000`).split('\n').map((line, i) => {line}{'\n'})}

Query (OpenAI-compatible):


{`from openai import OpenAI

client = OpenAI(base_url="http://localhost:30000/v1", api_key="None")

response = client.chat.completions.create(
    model="${modelId}",
    messages=[{"role": "user", "content": "What is machine learning?"}],
    ${samplingParams || "temperature=0.3,"}
)

print(response.choices[0].message.content)`.split('\n').map((line, i) => {line}{'\n'})}

; export const TextVllm = ({modelId, samplingParams, maxTokens}) =>

Install:


{`pip install vllm==0.14`.split('\n').map((line, i) => {line}{'\n'})}

Run:


{`from vllm import LLM, SamplingParams

llm = LLM(model="${modelId}")

sampling_params = SamplingParams(${samplingParams}max_tokens=${maxTokens || 512})

output = llm.chat("What is machine learning?", sampling_params)
print(output[0].outputs[0].text)`.split('\n').map((line, i) => {line}{'\n'})}

; export const TextTransformers = ({modelId, samplingParams}) =>

Install:


{`pip install "transformers>=5.2.0" torch accelerate`.split('\n').map((line, i) => {line}{'\n'})}

Download & Run:


{`from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "${modelId}"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

inputs = tokenizer.apply_chat_template(
    [{"role": "user", "content": "What is machine learning?"}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
    return_dict=True,
).to(model.device)

output = model.generate(**inputs, ${samplingParams}max_new_tokens=512)
input_length = inputs["input_ids"].shape[1]
response = tokenizer.decode(output[0][input_length:], skip_special_tokens=True)
print(response)`.split('\n').map((line, i) => {line}{'\n'})}

; ← Back to Text Models LFM2.5-350M is Liquid AI's smallest LFM2.5 text model, designed for edge devices with strict memory and compute constraints. Built on the LFM2.5 architecture with extended pre-training and reinforcement learning, it delivers improved chat, instruction-following, and tool-calling performance over LFM2-350M while keeping the same compact footprint.

HF GGUF MLX ONNX

## Specifications | Property | Value | | -------------- | -------------- | | Parameters | 350M | | Context Length | 32K tokens | | Architecture | LFM2.5 (Dense) |

Minimal memory and compute footprint Native function calling support Runs on mobile and embedded devices

## Quick Start