Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt

Use this file to discover all available pages before exploring further.

If you’ve used a cloud chat-completion API (OpenAI, Anthropic, etc.), most of LEAP’s shape will be familiar β€” async streaming, role-tagged messages, JSON-serializable history. The biggest difference: you load the model explicitly, locally, before generation, instead of pointing a client at a remote endpoint. This page maps the OpenAI Python client’s flow onto the LEAP SDK across Swift, Kotlin (Android), and Kotlin (JVM / native). For OpenAI compatibility on the client side, also see OpenAI-Compatible Client.

Reference: an OpenAI streaming call

from openai import OpenAI
client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Say 'double bubble bath' ten times fast."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices:
        delta = chunk.choices[0].delta.get("content")
        if delta:
            print(delta, end="", flush=True)
print("\nGeneration done!")

1. Load the model (vs. construct a client)

Cloud APIs create a thin client that points at a remote endpoint. LEAP downloads the model the first time and loads it into a ModelRunner β€” typically a few seconds depending on model size and device.
client = OpenAI()
The returned ModelRunner plays the same role as the cloud API’s client object β€” except it carries the model weights. Release it and you’ll have to load again before generating.

2. Request generation

The cloud API takes a messages array and returns a stream. LEAP attaches messages to a Conversation (so history is tracked automatically) and returns an async stream from generateResponse(...).
stream = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "..."}],
    stream=True,
)
You don’t pass the model name on each call β€” the Conversation is already bound to the runner that loaded it.

3. Consume the stream

Cloud APIs deliver deltas; you concatenate them. LEAP delivers MessageResponse values; each variant maps to a UI update, audio frame, tool call, or completion marker.
for chunk in stream:
    if chunk.choices:
        delta = chunk.choices[0].delta.get("content")
        if delta:
            print(delta, end="", flush=True)

4. Async context

Both LEAP and the OpenAI Python streaming client run inside an async context. The SDK’s call shape mirrors the language’s idiomatic concurrency primitives.
Wrap calls in a Task. SwiftUI’s .task modifier on a view is the most common entry. @MainActor view models keep model state on the main thread; the for try await loop suspends the task until the next chunk arrives.
@MainActor
final class ChatViewModel: ObservableObject {
    @Published var currentResponse = ""
    private var runner: ModelRunner?
    private var conversation: Conversation?
    private let downloader: ModelDownloader = {
        let caches = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first!.path
        let modelsDir = (caches as NSString).appendingPathComponent("leap_models")
        return ModelDownloader(config: LeapDownloaderConfig(saveDir: modelsDir))
    }()

    func loadModel() async {
        runner = try? await downloader.loadModel(
            modelName: "LFM2.5-1.2B-Instruct",
            quantizationType: "Q4_K_M"
        )
        conversation = runner?.createConversation()
    }

    func sendMessage(_ text: String) {
        guard let conversation else { return }
        Task {
            let message = ChatMessage(role: .user, content: [.text(text)])
            for try await response in conversation.generateResponse(message: message) {
                if case .chunk(let c) = onEnum(of: response) {
                    currentResponse += c.text
                }
            }
        }
    }
}

What’s the same

ConceptOpenAILEAP
Role-tagged messages{"role": "user", "content": "..."}ChatMessage(role: .user, content: [.text("...")])
Streaming responsesstream=True iteratorAsyncThrowingStream (Swift) / Flow (Kotlin)
Function callingTool definitions + tool_calls fieldregisterFunction(LeapFunction) + MessageResponse.functionCalls
Structured outputresponse_format = json_schemaGenerationOptions.setResponseFormat(type:)
Token usage statsusage object on completionComplete.stats (promptTokens, completionTokens, tokenPerSecond)

What’s different

  • No remote endpoint. You ship the model with the app (or download it the first time it runs). Latency is bounded by device CPU/GPU, not network round-trips.
  • Explicit lifecycle. Hold a ModelRunner reference; unload() when done. Cloud clients never load anything explicitly.
  • Multimodal inputs go in content array, same as OpenAI. Image and audio parts use the same OpenAI image_url / input_audio wire format.
  • Companion files for multimodal models. Vision and audio-capable models need an mmproj (vision) and/or audio decoder/tokenizer co-located on disk. Manifest-based loading handles this automatically; loadSimpleModel accepts explicit mmprojPath / audioDecoderPath / audioTokenizerPath.

Next steps