LFM2.5-8B-A1B - Liquid Docs

← Back to Text Models LFM2.5-8B-A1B is Liquid AI’s Mixture-of-Experts model, combining 8B total parameters with only 1.5B active parameters per forward pass with a 128K context window and chain of thought reasoning. This model delivers exceptional performance in tool calling and agentic tasks while running on-device.

HF GGUF MLX ONNX

Specifications

Property	Value
Parameters	8B (1.5B active)
Context Length	128K tokens
Architecture	LFM2.5 (MoE)

128K Context

Extended context window for long documents and conversations

MoE Efficiency

8B quality, 1.5B inference cost

Tool Calling

Native function calling for agentic workflows

Quick Start

Transformers
llama.cpp
vLLM
SGLang

Quick start with Transformers (compatible with transformers>=5.0.0):

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model_id = "LiquidAI/LFM2.5-8B-A1B"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
#   attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

prompt = "What is C. elegans?"

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.2,
    top_k=80,
    repetition_penalty=1.05,
    max_new_tokens=8192,
    streamer=streamer,
)

⌘I

​Specifications

128K Context

MoE Efficiency

Tool Calling

​Quick Start

Specifications

Quick Start