Skip to main content

Ollama

Ollama is a command-line tool for running LLMs locally with a simple interface. It provides easy model management and serving with an OpenAI-compatible API.

Use Ollama for:
  • Simple command-line interface
  • Quick local model serving
  • Docker-based deployment

Ollama uses GGUF models and supports GPU acceleration (CUDA, Metal, ROCm).

Installation​

Download directly from ollama.com/download.

Using LFM2 Models​

Ollama can load GGUF models directly from Hugging Face or from local files.

Running GGUFs​

You can run LFM2 models directly from Hugging Face:

ollama run hf.co/LiquidAI/LFM2-1.2B-GGUF

See the Models page for all available GGUF repositories.

To use a local GGUF file, first download a model from Hugging Face:

pip install huggingface-hub
hf download LiquidAI/LFM2-1.2B-GGUF {quantization}.gguf --local-dir .

Replace {quantization} with your preferred quantization level (e.g., q4_k_m, q8_0).

Then run the local model:

ollama run /path/to/model.gguf
Custom Setup with Modelfile

For custom configurations (specific quantization, chat template, or parameters), create a Modelfile.

Create a plain text file named Modelfile (no extension) with the following content:

FROM /path/to/model.gguf

TEMPLATE """<|startoftext|><|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|endoftext|>"

Import the model with the Modelfile:

ollama create my-model -f Modelfile

Then run it:

ollama run my-model

Basic Usage​

Interact with models through the command-line interface.

Interactive Chat​

ollama run hf.co/LiquidAI/LFM2-1.2B-GGUF

Type your messages and press Enter. Use /bye to exit.

Single Prompt​

ollama run hf.co/LiquidAI/LFM2-1.2B-GGUF "What is machine learning?"

If you imported a model with a custom name using a Modelfile, use that name instead (e.g., ollama run my-model).

Serving Models​

Ollama automatically starts a server on http://localhost:11434 with an OpenAI-compatible API for programmatic access.

Python Client​

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="not-needed"
)

response = client.chat.completions.create(
model="hf.co/LiquidAI/LFM2-1.2B-GGUF",
messages=[
{"role": "user", "content": "Explain quantum computing."}
],
temperature=0.7
)

print(response.choices[0].message.content)
Curl request examples

Ollama provides two native API endpoints:

Generate API (simple completion):

curl http://localhost:11434/api/generate -d '{
"model": "hf.co/LiquidAI/LFM2-1.2B-GGUF",
"prompt": "What is artificial intelligence?"
}'

Chat API (conversational format):

curl http://localhost:11434/api/chat -d '{
"model": "hf.co/LiquidAI/LFM2-1.2B-GGUF",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'

Vision Models​

LFM2-VL GGUF models can also be used for multimodal inference with Ollama.

Interactive Chat with Images

Run a vision model directly and provide images in the chat:

ollama run hf.co/LiquidAI/LFM2-VL-1.6B-GGUF

In the interactive chat, you can ask questions about images using the /image command followed by the file path:

>>> /image path/to/image.jpg What's in this image?

Or provide the image path directly in your prompt:

>>> Describe the contents of ~/Downloads/photo.png
Using the API
from openai import OpenAI
import base64

client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="not-needed"
)

# Encode image to base64
with open("image.jpg", "rb") as image_file:
image_data = base64.b64encode(image_file.read()).decode("utf-8")

response = client.chat.completions.create(
model="hf.co/LiquidAI/LFM2-VL-1.6B-GGUF",
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
{"type": "text", "text": "What's in this image?"}
]
}
]
)

print(response.choices[0].message.content)

Model Management​

List installed models:

ollama list

Remove a model:

ollama rm hf.co/LiquidAI/LFM2-1.2B-GGUF

Show model information:

ollama show hf.co/LiquidAI/LFM2-1.2B-GGUF