Specifications
| Property | Value |
|---|---|
| Parameters | 450M total (350M LM + ~100M vision encoder) |
| Context Length | 128K tokens |
| Image Input | Single image, dynamic resolution |
| Task | Vision structured extraction |
| Output Format | JSON |
Edge Extraction
Structured image extraction for small devices.
Visual Tagging
Label image attributes with schema control.
Low Latency
Fast extraction for high-volume workflows.
Prompting Recipe
Describe the fields to extract as YAML in the system prompt, then provide the image as the user message.Quick Start
- Transformers
- llama.cpp
Install:Run: