Skip to main content
← Back to Liquid Nanos LFM2.5-VL-1.6B-Extract extracts user-defined fields from images and returns structured JSON. It extends the Extract family to vision workflows, using a YAML field list in the system prompt to define exactly what to extract.

Specifications

PropertyValue
Parameters1.6B total (1.2B LM + ~400M vision encoder)
Context Length128K tokens
Image InputSingle image, dynamic resolution
TaskVision structured extraction
Output FormatJSON

Image Inspection

Extract visual attributes into JSON.

Retail Tagging

Auto-tag product images with structured fields.

Safety Signals

Detect visual events for automated workflows.

Prompting Recipe

Use greedy decoding for best results. This model is intended for single-image, single-turn extraction.
Describe the fields to extract as YAML in the system prompt, then provide the image as the user message.
wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface
wood_pattern: The pattern types visible on the wood surface

Quick Start

Install:
pip install "transformers>=5.1.0" pillow torch accelerate
Run:
from transformers import AutoModelForImageTextToText, AutoProcessor
from transformers.image_utils import load_image

model_id = "LiquidAI/LFM2.5-VL-1.6B-Extract"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
    trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

image = load_image("https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-Extract/resolve/main/sample_image.png")
fields_yaml = """wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface
wood_pattern: The pattern types visible on the wood surface"""

system_prompt = f"""Extract the following from the image:

{fields_yaml}

Respond with only a JSON object. Do not include any text outside the JSON."""

conversation = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": [{"type": "image", "image": image}]},
]

inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = processor.batch_decode(
    outputs[:, inputs["input_ids"].shape[1]:],
    skip_special_tokens=True,
)[0]
print(response)