Ollama
Ollama is a command-line tool for running LLMs locally with a simple interface. It provides easy model management and serving with an OpenAI-compatible API.
- Simple command-line interface
- Quick local model serving
- Docker-based deployment
Ollama uses GGUF models and supports GPU acceleration (CUDA, Metal, ROCm).
Installation​
- macOS and Windows
- Linux
- Docker
Download directly from ollama.com/download.
curl -fsSL https://ollama.com/install.sh | sh
Run Ollama with GPU acceleration inside Docker containers:
CPU only:
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
NVIDIA GPU:
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Then run a model:
docker exec -it ollama ollama run hf.co/LiquidAI/LFM2-1.2B-GGUF
See the Ollama Docker documentation for more details.
Using LFM2 Models​
Ollama can load GGUF models directly from Hugging Face or from local files.
Running GGUFs​
You can run LFM2 models directly from Hugging Face:
ollama run hf.co/LiquidAI/LFM2-1.2B-GGUF
See the Models page for all available GGUF repositories.
To use a local GGUF file, first download a model from Hugging Face:
pip install huggingface-hub
hf download LiquidAI/LFM2-1.2B-GGUF {quantization}.gguf --local-dir .
Replace {quantization} with your preferred quantization level (e.g., q4_k_m, q8_0).
Then run the local model:
ollama run /path/to/model.gguf
Custom Setup with Modelfile
For custom configurations (specific quantization, chat template, or parameters), create a Modelfile.
Create a plain text file named Modelfile (no extension) with the following content:
FROM /path/to/model.gguf
TEMPLATE """<|startoftext|><|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|endoftext|>"
Import the model with the Modelfile:
ollama create my-model -f Modelfile
Then run it:
ollama run my-model
Basic Usage​
Interact with models through the command-line interface.
Interactive Chat​
ollama run hf.co/LiquidAI/LFM2-1.2B-GGUF
Type your messages and press Enter. Use /bye to exit.
Single Prompt​
ollama run hf.co/LiquidAI/LFM2-1.2B-GGUF "What is machine learning?"
If you imported a model with a custom name using a Modelfile, use that name instead (e.g., ollama run my-model).
Serving Models​
Ollama automatically starts a server on http://localhost:11434 with an OpenAI-compatible API for programmatic access.
Python Client​
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="not-needed"
)
response = client.chat.completions.create(
model="hf.co/LiquidAI/LFM2-1.2B-GGUF",
messages=[
{"role": "user", "content": "Explain quantum computing."}
],
temperature=0.7
)
print(response.choices[0].message.content)
Curl request examples
Ollama provides two native API endpoints:
Generate API (simple completion):
curl http://localhost:11434/api/generate -d '{
"model": "hf.co/LiquidAI/LFM2-1.2B-GGUF",
"prompt": "What is artificial intelligence?"
}'
Chat API (conversational format):
curl http://localhost:11434/api/chat -d '{
"model": "hf.co/LiquidAI/LFM2-1.2B-GGUF",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
Vision Models​
LFM2-VL GGUF models can also be used for multimodal inference with Ollama.
Interactive Chat with Images
Run a vision model directly and provide images in the chat:
ollama run hf.co/LiquidAI/LFM2-VL-1.6B-GGUF
In the interactive chat, you can ask questions about images using the /image command followed by the file path:
>>> /image path/to/image.jpg What's in this image?
Or provide the image path directly in your prompt:
>>> Describe the contents of ~/Downloads/photo.png
Using the API
from openai import OpenAI
import base64
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="not-needed"
)
# Encode image to base64
with open("image.jpg", "rb") as image_file:
image_data = base64.b64encode(image_file.read()).decode("utf-8")
response = client.chat.completions.create(
model="hf.co/LiquidAI/LFM2-VL-1.6B-GGUF",
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
{"type": "text", "text": "What's in this image?"}
]
}
]
)
print(response.choices[0].message.content)
Model Management​
List installed models:
ollama list
Remove a model:
ollama rm hf.co/LiquidAI/LFM2-1.2B-GGUF
Show model information:
ollama show hf.co/LiquidAI/LFM2-1.2B-GGUF