Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks through the patterns for building a real AI agent — multi-turn conversation, function calling with tool dispatch, multimodal inputs, and a complete view-model wiring. The cross-platform pages cover individual APIs in depth — start here for the full picture, then drill into the dedicated references when you need details.

Architecture

Downloader (LeapModelDownloader on Android · ModelDownloader on iOS / macOS · LeapDownloader on JVM/native)

ModelRunner

Conversation

MessageResponse (streaming: Chunk · ReasoningChunk · FunctionCalls · AudioSample · Complete)
The ModelRunner owns native memory; the Conversation holds chat history; the MessageResponse stream delivers incremental output. Same shape on every platform.
ConcernWhere it lives
Install & set up the dependencyQuick Start
Load the modelModel Loading
Drive the streaming loopConversation & Generation
Define and dispatch toolsFunction Calling
Force structured JSON outputConstrained Generation
Voice UXVoice Assistant Widget
Hybrid on-device + cloudOpenAI-Compatible Client
Desktop & native targetsDesktop & Native Platforms

The generation loop

Every agent has the same shape: send a ChatMessage, iterate the response stream, dispatch each variant. Use the language’s exhaustive switch — onEnum(of:) (Swift) or is checks against the sealed interface (Kotlin) — so the compiler errors if a new MessageResponse case is added.
@MainActor
private func handle(_ response: MessageResponse) {
    switch onEnum(of: response) {
    case .chunk(let chunk):
        currentText += chunk.text
    case .reasoningChunk(let reasoning):
        log("Reasoning:", reasoning.reasoning)
    case .functionCalls(let payload):
        for call in payload.functionCalls {
            Task { await dispatch(call) }
        }
    case .audioSample(let audio):
        audioPlayer.enqueue(audio.samples, sampleRate: Int(audio.sampleRate))
    case .complete(let completion):
        currentText = ""
        if let stats = completion.stats {
            log("Done: \(stats.totalTokens) tokens at \(stats.tokenPerSecond) tok/s")
        }
    }
}

Multi-turn with tool calls

The defining feature of an agent: the model emits FunctionCalls, you execute the tool, append the result as a tool-role message, and continue. The same pattern works on every platform.
func agentLoop(initialQuestion: String) async throws {
    var workingConv = conversation!
    var pending = ChatMessage(role: .user, content: [.text(initialQuestion)])

    while true {
        var toolCalls: [LeapFunctionCall] = []
        for try await response in workingConv.generateResponse(message: pending) {
            switch onEnum(of: response) {
            case .chunk(let c):
                appendUI(c.text)
            case .functionCalls(let payload):
                toolCalls.append(contentsOf: payload.functionCalls)
            case .complete:
                break
            default:
                break
            }
        }

        if toolCalls.isEmpty { break }   // Agent is done

        // Execute tools, append results, loop
        let toolMessages = await toolCalls.asyncMap { call in
            let result = await runtimeDispatch(call)
            return ChatMessage(role: .tool, content: [.text(result)])
        }
        let updatedHistory = workingConv.history + toolMessages
        workingConv = workingConv.modelRunner.createConversationFromHistory(history: updatedHistory)
        pending = ChatMessage(role: .user, content: [.text("")])  // empty turn — let the model continue
    }
}
Define runtimeDispatch(_:) as your tool-call → result router: validate arguments, call the underlying implementation, JSON-encode the result. Register the corresponding LeapFunction definitions on the conversation before you start the loop — see Function Calling.

Multimodal inputs

Multimodality is model-specific. Most multimodal models ship as text + one other modality (vision OR audio), not both. Send .image(...) parts only to a vision-capable model and .audio(...) parts only to an audio-capable model. Verify on the model’s Hugging Face card before wiring up the input.
// Vision-capable model
let imageMessage = ChatMessage(
  role: .user,
  content: [.text("Describe what you see."), .image(jpegData)]
)

// Audio-capable model — WAV blob
let audioMessage = ChatMessage(
  role: .user,
  content: [.text("Transcribe."), .audio(wavData)]
)

// Audio-capable model — raw float32 PCM samples (no WAV re-encode)
let pcmMessage = ChatMessage(
  role: .user,
  content: [.text("How's my pronunciation?"),
            ChatMessageContent.fromFloatSamples(samples, sampleRate: 16000)]
)
See Messages & Content for audio format requirements (WAV, mono, 16 kHz recommended) and helpers for recording from the microphone.

Complete view-model example

A ChatViewModel that loads the model, registers a tool, drives generation, and exposes streaming text to the UI.
import LeapModelDownloader
import Combine

@MainActor
final class ChatViewModel: ObservableObject {
    @Published var responseText = ""
    @Published var isLoading = false
    @Published var isGenerating = false
    @Published var errorMessage: String?

    private let downloader: ModelDownloader = {
        let caches = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first!.path
        let modelsDir = (caches as NSString).appendingPathComponent("leap_models")
        return ModelDownloader(config: LeapDownloaderConfig(saveDir: modelsDir))
    }()
    private var modelRunner: ModelRunner?
    private var conversation: Conversation?
    private var generationTask: Task<Void, Never>?

    func loadModel() async {
        isLoading = true
        defer { isLoading = false }
        do {
            let runner = try await downloader.loadModel(
                modelName: "LFM2.5-1.2B-Instruct",
                quantizationType: "Q4_K_M"
            )
            modelRunner = runner
            conversation = runner.createConversation(systemPrompt: "You are a helpful assistant.")
            conversation?.registerFunction(weatherFunction)
        } catch {
            errorMessage = "Failed to load model: \(error.localizedDescription)"
        }
    }

    func send(_ text: String) {
        guard let conversation else { return }
        generationTask?.cancel()
        isGenerating = true
        responseText = ""

        generationTask = Task { [weak self] in
            defer { Task { @MainActor in self?.isGenerating = false } }
            do {
                let userMessage = ChatMessage(role: .user, content: [.text(text)])
                for try await response in conversation.generateResponse(
                    message: userMessage,
                    generationOptions: GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05)
                ) {
                    await MainActor.run { self?.handle(response) }
                }
            } catch {
                await MainActor.run { self?.errorMessage = "Generation failed: \(error.localizedDescription)" }
            }
        }
    }

    func stopGeneration() {
        generationTask?.cancel()
    }

    @MainActor
    private func handle(_ response: MessageResponse) {
        switch onEnum(of: response) {
        case .chunk(let c): responseText += c.text
        case .reasoningChunk(let r): print("[thinking] \(r.reasoning)")
        case .functionCalls(let f):
            for call in f.functionCalls { Task { await dispatch(call) } }
        case .audioSample(let a):
            audioRenderer.enqueue(a.samples, sampleRate: Int(a.sampleRate))
        case .complete(let c):
            if let stats = c.stats {
                print("\nFinished — \(stats.totalTokens) tokens at \(stats.tokenPerSecond) tok/s")
            }
        }
    }
}

Pitfalls and best practices

  • Always handle every MessageResponse case. Even if you only care about .chunk and .complete, give .functionCalls, .audioSample, and .reasoningChunk explicit (empty) branches — otherwise an exhaustive switch will fail to compile when a new variant is added.
  • Cancel before re-issuing. Don’t start a second generateResponse(...) while one is in flight. Either cancel the previous Task / Job, or check conversation.isGenerating first.
  • Don’t runBlocking in production paths. It’s fine in onCleared() for guaranteed cleanup (because viewModelScope is already cancelled at that point). Anywhere else, it freezes the calling thread.
  • Use cacheDir (Android) / cachesDirectory (iOS) for KV-cache reuse paths. They’re regenerable — letting the OS reclaim them on storage pressure is the right semantics. See Model Loading → KV cache reuse.
  • Validate tool-call arguments before dispatching. The arguments: Map<String, Any?> (Kotlin) / [String: Any?] (Swift) shape is unsafe by design — defensively coerce types and apply business-level invariants.
  • Match the model’s recommended sampling parameters. The LEAP bundle manifest (sampling_parameters under generation_time_parameters in each <Quant>.json on LiquidAI/LeapBundles) carries defaults tuned per checkpoint for the llama.cpp engine the SDK runs. Overriding temperature and friends often hurts quality more than it helps — start from the manifest values rather than the HF model card defaults (the two can differ).

Platform-specific concerns

  • iOS deployment target: 17.0+ · macOS: 15.0+
  • Xcode 16.0+, Swift 6.0
  • Run model loads inside a Task from a @MainActor view model. ModelDownloader background downloads (via URLSessionConfiguration.background(withIdentifier:)) survive app suspension; see Model Loading.
  • The voice widget exists on UIKit and AppKit — see Voice Assistant Widget.

Troubleshooting

SymptomLikely cause / fix
LeapModelLoadingException / LeapError.modelLoadingFailureMissing companion file for multimodal model (mmproj / audio decoder). Verify the manifest or pass explicit paths via loadSimpleModel(ModelSource(...)).
Model loads but generates gibberishWrong sampling parameters or wrong function-call parser for the model family. Check the model card; default to LFMFunctionCallParser for LFM models, HermesFunctionCallParser for Qwen3/Hermes.
”ZIP archive corrupted” on downloadNetwork hiccup mid-download. LeapDownloader / LeapModelDownloader validates SHA-256, so a partial file fails the check. Remove the cache directory and retry.
Generation hangs after cancel()Cancellation is cooperative — the engine checks between tokens. There’s at most one extra token of slack. If it’s longer, you may be missing a Job cancel or the stream is being awaited on a thread other than the one you’re cancelling from.
Voice widget records silenceMissing microphone permission, or AVAudioSession/Android audio config not set to playAndRecord / mono / 16 kHz. See Voice Assistant Widget.
K/N executable fails at start with dlsym@GLIBC_2.34Runtime host’s glibc is older than 2.34. Upgrade to Ubuntu 22.04+, Debian 12+, RHEL 9+, or build for an older runtime target.
Compile error “@Guide annotation missing” (Swift)All properties on a @Generatable struct need a @Guide. Annotate every stored property.
For deeper failure-mode coverage, see Utilities → Errors.