Skip to main content
← Back to Liquid Nanos LFM2.5-VL-450M-Extract is the compact vision extraction Nano for turning images into structured JSON. It uses the same YAML-schema prompting pattern as the larger VL Extract model while keeping the deployment footprint small.

Specifications

PropertyValue
Parameters450M total (350M LM + ~100M vision encoder)
Context Length128K tokens
Image InputSingle image, dynamic resolution
TaskVision structured extraction
Output FormatJSON

Edge Extraction

Structured image extraction for small devices.

Visual Tagging

Label image attributes with schema control.

Low Latency

Fast extraction for high-volume workflows.

Prompting Recipe

Use greedy decoding for best results. This model is intended for single-image, single-turn extraction.
Describe the fields to extract as YAML in the system prompt, then provide the image as the user message.
wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface
wood_pattern: The pattern types visible on the wood surface

Quick Start

Install:
pip install "transformers>=5.1.0" pillow torch accelerate
Run:
from transformers import AutoModelForImageTextToText, AutoProcessor
from transformers.image_utils import load_image

model_id = "LiquidAI/LFM2.5-VL-450M-Extract"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
    trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

image = load_image("https://huggingface.co/LiquidAI/LFM2.5-VL-450M-Extract/resolve/main/sample_image.png")
fields_yaml = """wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface
wood_pattern: The pattern types visible on the wood surface"""

system_prompt = f"""Extract the following from the image:

{fields_yaml}

Respond with only a JSON object. Do not include any text outside the JSON."""

conversation = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": [{"type": "image", "image": image}]},
]

inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = processor.batch_decode(
    outputs[:, inputs["input_ids"].shape[1]:],
    skip_special_tokens=True,
)[0]
print(response)