LFM2-700M

← Back to Text Models LFM2-700M is a compact model balancing capability and efficiency. Suitable for deployment on a wide range of devices including phones, tablets, and laptops with limited resources.

HF GGUF MLX ONNX

Specifications

Property	Value
Parameters	700M
Context Length	32K tokens
Architecture	LFM2 (Dense)

Edge Deployment

Optimized for resource-constrained devices

Low Latency

Fast inference for real-time applications

Fine-tunable

TRL compatible (SFT, DPO, GRPO)

Quick Start

Transformers
llama.cpp
vLLM

Install:

pip install "transformers>=5.0.0" torch accelerate

Download & Run:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "LiquidAI/LFM2-700M"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

inputs = tokenizer.apply_chat_template(
    [{"role": "user", "content": "What is machine learning?"}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
    return_dict=True,
).to(model.device)

output = model.generate(**inputs, do_sample=True, temperature=0.3, min_p=0.15, repetition_penalty=1.05, max_new_tokens=512)
input_length = inputs["input_ids"].shape[1]
response = tokenizer.decode(output[0][input_length:], skip_special_tokens=True)
print(response)

Install:

brew install llama.cpp

Run:

llama-cli -hf LiquidAI/LFM2-700M-GGUF -c 4096 --color -i \
    --temp 0.3 --min-p 0.15 --repeat-penalty 1.05

The -hf flag downloads the model directly from Hugging Face. For other installation methods and advanced usage, see the llama.cpp guide.

Install:

pip install vllm==0.14

Run:

from vllm import LLM, SamplingParams

llm = LLM(model="LiquidAI/LFM2-700M")

sampling_params = SamplingParams(temperature=0.3, min_p=0.15, repetition_penalty=1.05, max_tokens=512)

output = llm.chat("What is machine learning?", sampling_params)
print(output[0].outputs[0].text)

Getting Started

Models

Key Concepts

Help

Specifications

Edge Deployment

Low Latency

Fine-tunable

Quick Start

Getting Started

Models

Key Concepts

Help

​Specifications

Edge Deployment

Low Latency

Fine-tunable

​Quick Start

Specifications

Quick Start