Specifications
| Property | Value |
|---|---|
| Parameters | 1.6B total (1.2B LM + ~400M vision encoder) |
| Context Length | 128K tokens |
| Image Input | Single image, dynamic resolution |
| Task | Vision structured extraction |
| Output Format | JSON |
Image Inspection
Extract visual attributes into JSON.
Retail Tagging
Auto-tag product images with structured fields.
Safety Signals
Detect visual events for automated workflows.
Prompting Recipe
Describe the fields to extract as YAML in the system prompt, then provide the image as the user message.Quick Start
- Transformers
- llama.cpp
Install:Run: