> ## Documentation Index
> Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Ollama

> Ollama is a command-line tool for running LLMs locally with a simple interface. It provides easy model management and serving with an OpenAI-compatible API.

<Tip>
  Use Ollama for quick local model serving with a simple CLI or Docker-based deployment.
</Tip>

Ollama uses GGUF models and supports GPU acceleration (CUDA, Metal, ROCm).

<Warning>
  The official Ollama v0.17.0 (latest stable) from [ollama.com](https://ollama.com) fails with a `missing tensor 'output_norm.weight'` error on the `lfm2moe` architecture. This affects all LFM MoE models (e.g. LFM2-24B-A2B, LFM2-8A-A1B). To run any LFM MoE model you specifically need [v0.17.1-rc0](https://github.com/ollama/ollama/releases/tag/v0.17.1-rc0) or later.
</Warning>

<div className="colab-link">
  <a href="https://colab.research.google.com/github/Liquid4All/docs/blob/main/notebooks/LFM2_Inference_with_Ollama.ipynb" target="_blank">
    <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" />
  </a>
</div>

## Installation

<Tabs>
  <Tab title="macOS and Windows">
    Download directly from [ollama.com/download](https://ollama.com/download).
  </Tab>

  <Tab title="Linux">
    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    curl -fsSL https://ollama.com/install.sh | sh
    ```
  </Tab>

  <Tab title="Docker">
    Run Ollama with GPU acceleration inside Docker containers:

    **CPU only:**

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
    ```

    **NVIDIA GPU:**

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
    ```

    Then run a model:

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
    docker exec -it ollama ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF
    ```

    See the [Ollama Docker documentation](https://ollama.com/blog/ollama-is-now-available-as-an-official-docker-image) for more details.
  </Tab>
</Tabs>

## Using LFM2 Models

Ollama can load GGUF models directly from Hugging Face or from local files.

### Running GGUFs

You can run LFM2 models directly from Hugging Face:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF
```

See the [Models page](/lfm/models/complete-library) for all available GGUF repositories.

To use a local GGUF file, first download a model from Hugging Face:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
uv pip install huggingface-hub
hf download LiquidAI/LFM2.5-1.2B-Instruct-GGUF {quantization}.gguf --local-dir .
```

Replace `{quantization}` with your preferred quantization level (e.g., `q4_k_m`, `q8_0`).

Then run the local model:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
ollama run /path/to/model.gguf
```

<Accordion title="Custom Setup with Modelfile">
  For custom configurations (specific quantization, chat template, or parameters), create a Modelfile.

  Create a plain text file named `Modelfile` (no extension) with the following content:

  ```
  FROM /path/to/model.gguf

  TEMPLATE """<|startoftext|><|im_start|>system
  {{ .System }}<|im_end|>
  <|im_start|>user
  {{ .Prompt }}<|im_end|>
  <|im_start|>assistant
  """

  PARAMETER temperature 0.1
  PARAMETER top_k 50
  PARAMETER repeat_penalty 1.05
  PARAMETER stop "<|im_end|>"
  PARAMETER stop "<|endoftext|>"
  ```

  Import the model with the Modelfile:

  ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
  ollama create my-model -f Modelfile
  ```

  Then run it:

  ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
  ollama run my-model
  ```
</Accordion>

## Basic Usage

Interact with models through the command-line interface.

### Interactive Chat

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF
```

Type your messages and press Enter. Use `/bye` to exit.

### Single Prompt

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
ollama run hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF "What is machine learning?"
```

If you imported a model with a custom name using a Modelfile, use that name instead (e.g., `ollama run my-model`).

## Serving Models

Ollama automatically starts a server on `http://localhost:11434` with an OpenAI-compatible API for programmatic access.

### Python Client

```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF",
    messages=[
        {"role": "user", "content": "Explain quantum computing."}
    ],
    temperature=0.1,
    extra_body={"top_k": 50, "repeat_penalty": 1.05},
)
print(response.choices[0].message.content)
```

<Accordion title="Curl request examples">
  Ollama provides two native API endpoints:

  **Generate API** (simple completion):

  ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl http://localhost:11434/api/generate -d '{
    "model": "hf.co/LiquidAI/LFM2-1.2B-GGUF",
    "prompt": "What is artificial intelligence?"
  }'
  ```

  **Chat API** (conversational format):

  ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
  curl http://localhost:11434/api/chat -d '{
    "model": "hf.co/LiquidAI/LFM2-1.2B-GGUF",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'
  ```
</Accordion>

## Vision Models

LFM2-VL GGUF models can also be used for multimodal inference with Ollama.

<Accordion title="Interactive Chat with Images">
  Run a vision model directly and provide images in the chat:

  ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
  ollama run hf.co/LiquidAI/LFM2.5-VL-1.6B-GGUF
  ```

  In the interactive chat, you can ask questions about images using the `/image` command followed by the file path:

  ```
  >>> /image path/to/image.jpg What's in this image?
  ```

  Or provide the image path directly in your prompt:

  ```
  >>> Describe the contents of ~/Downloads/photo.png
  ```
</Accordion>

<Accordion title="Using the API">
  ```python theme={"theme":{"light":"github-light","dark":"github-dark"}}
  from openai import OpenAI
  import base64

  client = OpenAI(
      base_url="http://localhost:11434/v1",
      api_key="not-needed"
  )

  # Encode image to base64
  with open("image.jpg", "rb") as image_file:
      image_data = base64.b64encode(image_file.read()).decode("utf-8")

  response = client.chat.completions.create(
      model="hf.co/LiquidAI/LFM2.5-VL-1.6B-GGUF",
      messages=[
          {
              "role": "user",
              "content": [
                  {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
                  {"type": "text", "text": "What's in this image?"}
              ]
          }
      ]
  )
  print(response.choices[0].message.content)
  ```
</Accordion>

## Model Management

List installed models:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
ollama list
```

Remove a model:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
ollama rm hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF
```

Show model information:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
ollama show hf.co/LiquidAI/LFM2.5-1.2B-Instruct-GGUF
```