LM Studio

LM Studio is a desktop application for running LLMs locally with a graphical interface.

Use LM Studio for:

No command-line setup required
Easy model discovery and download
Quick testing with a built-in chat interface

Installation

Download and install LM Studio directly from lmstudio.ai.

Downloading Models

Open LM Studio and click the Search tab (🔍)
Search for "LiquidAI" or "LFM2"
Select a model and quantization level (Q4_K_M recommended)
Click Download

See the Models page for all available GGUF models.

Using the Chat Interface

Go to the Chat tab (💬)
Select your model from the dropdown
Adjust parameters (temperature, max_tokens, top_p) in the sidebar
Start chatting

Generation Parameters

Control text generation behavior using the GUI sidebar or API parameters. Key parameters:

temperature (float, default 1.0): Controls randomness (0.0 = deterministic, higher = more random). Typical range: 0.1-2.0
top_p (float, default 1.0): Nucleus sampling - limits to tokens with cumulative probability ≤ top_p. Typical range: 0.1-1.0
top_k (int, default 40): Limits to top-k most probable tokens. Typical range: 1-100
max_tokens (int): Maximum number of tokens to generate
repetition_penalty (float, default 1.0): Penalty for repeating tokens (>1.0 = discourage repetition). Typical range: 1.0-1.5
stop (str or list[str]): Strings that terminate generation when encountered

Via the OpenAI-compatible API:

response = client.chat.completions.create(
    model="local-model",
    messages=[{"role": "user", "content": "What is machine learning?"}],
    temperature=0.7,
    top_p=0.9,
    top_k=40,
    max_tokens=512,
    repetition_penalty=1.1,
)

Running the Server

Start an OpenAI-compatible server for programmatic access:

Go to the Developer tab (⚙️)
Select your model
Click Start Server (runs at http://localhost:1234)

Use the OpenAI Python client:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="local-model",  # Any string works
    messages=[
        {"role": "user", "content": "What is machine learning?"}
    ],
    temperature=0.7,
    max_tokens=512
)

print(response.choices[0].message.content)

Streaming Responses

stream = client.chat.completions.create(
    model="local-model",
    messages=[
        {"role": "user", "content": "Tell me a story."}
    ],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Curl request example

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local-model",
    "messages": [{"role": "user", "content": "Hello!"}],
    "temperature": 0.7
  }'

Vision Models

Search for "LiquidAI LFM2-VL" to download vision models. In the Chat tab:

Drag and drop images into the chat
Click the image icon to upload
Provide image URLs

Using the API

from openai import OpenAI
import base64

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="not-needed"
)

# Encode image to base64
with open("image.jpg", "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode("utf-8")

response = client.chat.completions.create(
    model="local-model",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
                {"type": "text", "text": "What's in this image?"}
            ]
        }
    ]
)

print(response.choices[0].message.content)

Tips

GPU Acceleration: Automatically detects and uses available GPUs
Model Management: Delete models from the My Models section
Performance: Adjust GPU layers in server settings for speed/memory balance
Quantization: Q4 is faster, Q6/Q8 have better quality

Installation​

Downloading Models​

Using the Chat Interface​

Generation Parameters​

Running the Server​

Streaming Responses​

Vision Models​

Tips​