> ## Documentation Index
> Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Prompting Guide

> This guide covers how to effectively use system prompts, user prompts, and assistant prompts with LFM2 models, along with an overview of sampling parameters and special prompting recipes for specific models.

## Prompt Roles

LFM2 models use a structured conversation format with three prompt roles:

* **`system`** (optional) - Sets assistant behavior, context, and instructions. Use for personality, task context, output format, or constraints.
* **`user`** - Contains the question, instruction, or request from the user.
* **`assistant`** - Provides a partial response for the model to continue from. Useful for multi-turn conversations, few-shot prompting, or prefilling structured outputs (e.g., JSON opening brace).

**Example:**

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "How do I sort a list in Python?"}
]
```

<Accordion title="Additional examples: few-shot prompting and prefill">
  **Multi-turn conversations / Few-shot prompting:**

  Continue a previous conversation or provide example interactions to guide the model's behavior. The model learns from the conversation history and applies patterns to new inputs.

  ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  messages = [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What are the benefits of exercise?"},
      {"role": "assistant", "content": "Exercise has many benefits including:\n1. Improved cardiovascular health\n2. "},  # Partial response to continue
      {"role": "user", "content": "Tell me more about cardiovascular health."}
  ]
  ```

  Or provide few-shot examples:

  ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  messages = [
      {"role": "system", "content": "You are a helpful assistant that formats dates."},
      {"role": "user", "content": "2024-01-15"},
      {"role": "assistant", "content": "January 15, 2024"},
      {"role": "user", "content": "2024-12-25"},
      {"role": "assistant", "content": "December 25, 2024"},
      {"role": "user", "content": "2024-03-08"}  # Model follows the pattern
  ]
  ```

  **Prefill for structured output:**

  Start the model with a specific format or structure (e.g., JSON opening brace) to guide it toward structured outputs.

  ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  messages = [
      {"role": "system", "content": "Extract information and return as JSON."},
      {"role": "user", "content": "Extract the name and age from: John is 30 years old."},
      {"role": "assistant", "content": "{\n  \"name\": "}  # Prefill with JSON structure
  ]
  ```
</Accordion>

For structured generation with schema validation, see [Outlines](https://dottxt-ai.github.io/outlines).

## Text Models

Control text generation behavior, balancing creativity, determinism, and quality:

* **`temperature`** (0.0-2.0) - Randomness control. Lower (0.1-0.7) = deterministic; higher (0.8-1.5) = creative.
* **`top_p`** (0.0-1.0) - Nucleus sampling. Lower (0.1-0.5) = focused; higher (0.7-0.95) = diverse.
* **`top_k`** - Limits to top-k tokens. Lower (10-50) = high-probability; higher (50-100) = diverse.
* **`min_p`** (0.0-1.0) - Filters tokens below `min_p * max_probability`. Maintains quality with diversity.
* **`repetition_penalty`** (1.0+) - Reduces repetition. 1.0 = no penalty; >1.0 = prevents repetition.
* **`max_tokens`** / **`max_new_tokens`** - Maximum tokens to generate.

Parameter names and syntax vary by platform. See [Transformers](/deployment/gpu-inference/transformers), [vLLM](/deployment/gpu-inference/vllm), or [llama.cpp](/deployment/on-device/llama-cpp) for details.

### Recommended Settings Text

**For LFM2.5-1.2B-Instruct:**

* `temperature=0.1`
* `top_k=50`
* `repetition_penalty=1.05`

**For LFM2.5-1.2B-Thinking:**

* `temperature=0.1`
* `top_k=50`
* `top_p=0.1`
* `repetition_penalty=1.05`

**For LFM2 text models:**

* `temperature=0.3`
* `min_p=0.15`
* `repetition_penalty=1.05`

## Vision Models

LFM2-VL models use a **variable resolution encoder** to control the quality/speed tradeoff by adjusting how images are tokenized.

### Image Token Management

Control image tokenization with:

* **`min_image_tokens`** - Minimum tokens for image encoding
* **`max_image_tokens`** - Maximum tokens for image encoding
* **`do_image_splitting`** - Split large images into 512×512 patches

**How it works:** Large images are split into non-overlapping patches, then a 2-layer MLP connector with pixel unshuffle reduces tokens (e.g., 256×384 → 96 tokens, 1000×3000 → 1,020 tokens). Adjust `min_image_tokens` and `max_image_tokens` to balance quality vs. speed.

**Example configurations:**

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
# High quality (slower)
max_image_tokens = 256
min_image_tokens = 128

# Balanced
max_image_tokens = 128
min_image_tokens = 64

# Fast (lower quality)
max_image_tokens = 64
min_image_tokens = 32
```

### Recommended Settings Vision

**For vision models:**

* `temperature=0.1`
* `min_p=0.15`
* `repetition_penalty=1.05`
* `min_image_tokens=64`
* `max_image_tokens=256`
* `do_image_splitting=True`

<Note>
  **Liquid Nanos** (task-specific models like LFM2-Extract, LFM2-RAG, LFM2-Tool, etc.) may have special prompting requirements and different generation parameters. For the best usage guidelines, refer to the individual model cards on the [Liquid Nanos](/lfm/models/liquid-nanos) page.
</Note>