AI Agent Usage Guide

This guide walks through the patterns for building a real AI agent — multi-turn conversation, function calling with tool dispatch, multimodal inputs, and a complete view-model wiring. The cross-platform pages cover individual APIs in depth — start here for the full picture, then drill into the dedicated references when you need details.

Architecture

Downloader (LeapModelDownloader on Android · ModelDownloader on iOS / macOS · LeapDownloader on JVM/native)
    ↓
ModelRunner
    ↓
Conversation
    ↓
MessageResponse (streaming: Chunk · ReasoningChunk · FunctionCalls · AudioSample · Complete)

The ModelRunner owns native memory; the Conversation holds chat history; the MessageResponse stream delivers incremental output. Same shape on every platform.

Concern	Where it lives
Install & set up the dependency	Quick Start
Load the model	Model Loading
Drive the streaming loop	Conversation & Generation
Define and dispatch tools	Function Calling
Force structured JSON output	Constrained Generation
Voice UX	Voice Assistant Widget
Hybrid on-device + cloud	OpenAI-Compatible Client
Desktop & native targets	Desktop & Native Platforms

The generation loop

Every agent has the same shape: send a ChatMessage, iterate the response stream, dispatch each variant. Use the language’s exhaustive switch — onEnum(of:) (Swift) or is checks against the sealed interface (Kotlin) — so the compiler errors if a new MessageResponse case is added.

Swift (iOS / macOS)
Kotlin (all platforms)

@MainActor
private func handle(_ response: MessageResponse) {
    switch onEnum(of: response) {
    case .chunk(let chunk):
        currentText += chunk.text
    case .reasoningChunk(let reasoning):
        log("Reasoning:", reasoning.reasoning)
    case .functionCalls(let payload):
        for call in payload.functionCalls {
            Task { await dispatch(call) }
        }
    case .audioSample(let audio):
        audioPlayer.enqueue(audio.samples, sampleRate: Int(audio.sampleRate))
    case .complete(let completion):
        currentText = ""
        if let stats = completion.stats {
            log("Done: \(stats.totalTokens) tokens at \(stats.tokenPerSecond) tok/s")
        }
    }
}

suspend fun handle(response: MessageResponse) {
    when (response) {
        is MessageResponse.Chunk -> _text.value += response.text
        is MessageResponse.ReasoningChunk -> Log.d(TAG, "Reasoning: ${response.reasoning}")
        is MessageResponse.FunctionCalls -> {
            response.functionCalls.forEach { call -> dispatch(call) }
        }
        is MessageResponse.AudioSample -> audioPlayer.enqueue(response.samples, response.sampleRate)
        is MessageResponse.Complete -> {
            _text.value = ""
            Log.d(TAG, "Done: ${response.stats?.totalTokens} tokens at ${response.stats?.tokenPerSecond} tok/s")
        }
    }
}

Multi-turn with tool calls

The defining feature of an agent: the model emits FunctionCalls, you execute the tool, append the result as a tool-role message, and continue. The same pattern works on every platform.

Swift (iOS / macOS)
Kotlin (all platforms)

func agentLoop(initialQuestion: String) async throws {
    var workingConv = conversation!
    var pending = ChatMessage(role: .user, content: [.text(initialQuestion)])

    while true {
        var toolCalls: [LeapFunctionCall] = []
        for try await response in workingConv.generateResponse(message: pending) {
            switch onEnum(of: response) {
            case .chunk(let c):
                appendUI(c.text)
            case .functionCalls(let payload):
                toolCalls.append(contentsOf: payload.functionCalls)
            case .complete:
                break
            default:
                break
            }
        }

        if toolCalls.isEmpty { break }   // Agent is done

        // Execute tools, append results, loop
        let toolMessages = await toolCalls.asyncMap { call in
            let result = await runtimeDispatch(call)
            return ChatMessage(role: .tool, content: [.text(result)])
        }
        let updatedHistory = workingConv.history + toolMessages
        workingConv = workingConv.modelRunner.createConversationFromHistory(history: updatedHistory)
        pending = ChatMessage(role: .user, content: [.text("")])  // empty turn — let the model continue
    }
}

suspend fun agentLoop(initialQuestion: String) {
    var workingConv: Conversation = conversation!!
    var pending: ChatMessage = ChatMessage(
        role = ChatMessage.Role.USER,
        content = listOf(ChatMessageContent.Text(initialQuestion))
    )

    while (true) {
        val toolCalls = mutableListOf<LeapFunctionCall>()
        workingConv.generateResponse(pending).collect { response ->
            when (response) {
                is MessageResponse.Chunk -> appendUI(response.text)
                is MessageResponse.FunctionCalls -> toolCalls.addAll(response.functionCalls)
                else -> {}
            }
        }

        if (toolCalls.isEmpty()) break

        val toolMessages = toolCalls.map { call ->
            val result = runtimeDispatch(call)
            ChatMessage(
                role = ChatMessage.Role.TOOL,
                content = listOf(ChatMessageContent.Text(result))
            )
        }
        val updatedHistory = workingConv.history + toolMessages
        workingConv = modelRunner.createConversationFromHistory(updatedHistory)
        pending = ChatMessage(role = ChatMessage.Role.USER, content = listOf(ChatMessageContent.Text("")))
    }
}

Define runtimeDispatch(_:) as your tool-call → result router: validate arguments, call the underlying implementation, JSON-encode the result. Register the corresponding LeapFunction definitions on the conversation before you start the loop — see Function Calling.

Multimodal inputs

Multimodality is model-specific. Most multimodal models ship as text + one other modality (vision OR audio), not both. Send .image(...) parts only to a vision-capable model and .audio(...) parts only to an audio-capable model. Verify on the model’s Hugging Face card before wiring up the input.

Swift (iOS / macOS)
Kotlin (all platforms)

// Vision-capable model
let imageMessage = ChatMessage(
  role: .user,
  content: [.text("Describe what you see."), .image(jpegData)]
)

// Audio-capable model — WAV blob
let audioMessage = ChatMessage(
  role: .user,
  content: [.text("Transcribe."), .audio(wavData)]
)

// Audio-capable model — raw float32 PCM samples (no WAV re-encode)
let pcmMessage = ChatMessage(
  role: .user,
  content: [.text("How's my pronunciation?"),
            ChatMessageContent.fromFloatSamples(samples, sampleRate: 16000)]
)

// Vision-capable model
val imageMessage = ChatMessage(
    role = ChatMessage.Role.USER,
    content = listOf(
        ChatMessageContent.Text("Describe what you see."),
        ChatMessageContent.Image(jpegBytes)
    )
)

// Audio-capable model — WAV blob
val audioMessage = ChatMessage(
    role = ChatMessage.Role.USER,
    content = listOf(
        ChatMessageContent.Text("Transcribe."),
        ChatMessageContent.Audio(wavBytes)
    )
)

// Audio-capable model — raw float32 PCM (no WAV re-encode)
val pcmMessage = ChatMessage(
    role = ChatMessage.Role.USER,
    content = listOf(
        ChatMessageContent.Text("How's my pronunciation?"),
        ChatMessageContent.AudioPcmF32(samples, sampleRate = 16000)
    )
)

See Messages & Content for audio format requirements (WAV, mono, 16 kHz recommended) and helpers for recording from the microphone.

Complete view-model example

A ChatViewModel that loads the model, registers a tool, drives generation, and exposes streaming text to the UI.

Swift (iOS / macOS)
Kotlin (Android)

import LeapModelDownloader
import Combine

@MainActor
final class ChatViewModel: ObservableObject {
    @Published var responseText = ""
    @Published var isLoading = false
    @Published var isGenerating = false
    @Published var errorMessage: String?

    private let downloader: ModelDownloader = {
        let caches = FileManager.default.urls(for: .cachesDirectory, in: .userDomainMask).first!.path
        let modelsDir = (caches as NSString).appendingPathComponent("leap_models")
        return ModelDownloader(config: LeapDownloaderConfig(saveDir: modelsDir))
    }()
    private var modelRunner: ModelRunner?
    private var conversation: Conversation?
    private var generationTask: Task<Void, Never>?

    func loadModel() async {
        isLoading = true
        defer { isLoading = false }
        do {
            let runner = try await downloader.loadModel(
                modelName: "LFM2.5-1.2B-Instruct",
                quantizationType: "Q4_K_M"
            )
            modelRunner = runner
            conversation = runner.createConversation(systemPrompt: "You are a helpful assistant.")
            conversation?.registerFunction(weatherFunction)
        } catch {
            errorMessage = "Failed to load model: \(error.localizedDescription)"
        }
    }

    func send(_ text: String) {
        guard let conversation else { return }
        generationTask?.cancel()
        isGenerating = true
        responseText = ""

        generationTask = Task { [weak self] in
            defer { Task { @MainActor in self?.isGenerating = false } }
            do {
                let userMessage = ChatMessage(role: .user, content: [.text(text)])
                for try await response in conversation.generateResponse(
                    message: userMessage,
                    generationOptions: GenerationOptions(temperature: 0.3, minP: 0.15, repetitionPenalty: 1.05)
                ) {
                    await MainActor.run { self?.handle(response) }
                }
            } catch {
                await MainActor.run { self?.errorMessage = "Generation failed: \(error.localizedDescription)" }
            }
        }
    }

    func stopGeneration() {
        generationTask?.cancel()
    }

    @MainActor
    private func handle(_ response: MessageResponse) {
        switch onEnum(of: response) {
        case .chunk(let c): responseText += c.text
        case .reasoningChunk(let r): print("[thinking] \(r.reasoning)")
        case .functionCalls(let f):
            for call in f.functionCalls { Task { await dispatch(call) } }
        case .audioSample(let a):
            audioRenderer.enqueue(a.samples, sampleRate: Int(a.sampleRate))
        case .complete(let c):
            if let stats = c.stats {
                print("\nFinished — \(stats.totalTokens) tokens at \(stats.tokenPerSecond) tok/s")
            }
        }
    }
}

import android.app.Application
import androidx.lifecycle.AndroidViewModel
import androidx.lifecycle.viewModelScope
import ai.liquid.leap.Conversation
import ai.liquid.leap.GenerationOptions
import ai.liquid.leap.MessageResponse
import ai.liquid.leap.ModelRunner
import ai.liquid.leap.message.ChatMessage
import ai.liquid.leap.message.ChatMessageContent
import ai.liquid.leap.model_downloader.LeapModelDownloader
import kotlinx.coroutines.*
import kotlinx.coroutines.flow.*

class ChatViewModel(application: Application) : AndroidViewModel(application) {
    private val downloader = LeapModelDownloader(application)
    private var modelRunner: ModelRunner? = null
    private var conversation: Conversation? = null
    private var generationJob: Job? = null

    private val _responseText = MutableStateFlow("")
    val responseText: StateFlow<String> = _responseText.asStateFlow()

    private val _isLoading = MutableStateFlow(false)
    val isLoading: StateFlow<Boolean> = _isLoading.asStateFlow()

    private val _isGenerating = MutableStateFlow(false)
    val isGenerating: StateFlow<Boolean> = _isGenerating.asStateFlow()

    private val _errorMessage = MutableStateFlow<String?>(null)
    val errorMessage: StateFlow<String?> = _errorMessage.asStateFlow()

    fun loadModel() {
        viewModelScope.launch {
            _isLoading.value = true
            try {
                val runner = downloader.loadModel(
                    modelName = "LFM2.5-1.2B-Instruct",
                    quantizationType = "Q4_K_M",
                )
                modelRunner = runner
                conversation = runner.createConversation("You are a helpful assistant.").also {
                    it.registerFunction(weatherFunction)
                }
            } catch (e: Exception) {
                _errorMessage.value = "Failed to load model: ${e.message}"
            } finally {
                _isLoading.value = false
            }
        }
    }

    fun send(text: String) {
        generationJob?.cancel()
        _responseText.value = ""
        generationJob = viewModelScope.launch {
            _isGenerating.value = true
            val userMessage = ChatMessage(
                role = ChatMessage.Role.USER,
                content = listOf(ChatMessageContent.Text(text))
            )
            conversation?.generateResponse(
                userMessage,
                GenerationOptions.build {
                    temperature = 0.3f
                    minP = 0.15f
                    repetitionPenalty = 1.05f
                },
            )
                ?.onEach { handle(it) }
                ?.catch { e -> _errorMessage.value = "Generation failed: ${e.message}" }
                ?.onCompletion { _isGenerating.value = false }
                ?.collect()
        }
    }

    fun stopGeneration() { generationJob?.cancel() }

    private suspend fun handle(response: MessageResponse) {
        when (response) {
            is MessageResponse.Chunk -> _responseText.value += response.text
            is MessageResponse.ReasoningChunk -> Log.d(TAG, "Reasoning: ${response.reasoning}")
            is MessageResponse.FunctionCalls -> response.functionCalls.forEach { dispatch(it) }
            is MessageResponse.AudioSample -> audioRenderer.enqueue(response.samples, response.sampleRate)
            is MessageResponse.Complete -> Log.d(TAG, "Done: ${response.stats?.totalTokens} tokens")
        }
    }

    override fun onCleared() {
        super.onCleared()
        generationJob?.cancel()
        runBlocking(Dispatchers.IO) { modelRunner?.unload() }
    }

    companion object { private const val TAG = "ChatViewModel" }
}

Pitfalls and best practices

Always handle every MessageResponse case. Even if you only care about .chunk and .complete, give .functionCalls, .audioSample, and .reasoningChunk explicit (empty) branches — otherwise an exhaustive switch will fail to compile when a new variant is added.
Cancel before re-issuing. Don’t start a second generateResponse(...) while one is in flight. Either cancel the previous Task / Job, or check conversation.isGenerating first.
Don’t runBlocking in production paths. It’s fine in onCleared() for guaranteed cleanup (because viewModelScope is already cancelled at that point). Anywhere else, it freezes the calling thread.
Use cacheDir (Android) / cachesDirectory (iOS) for KV-cache reuse paths. They’re regenerable — letting the OS reclaim them on storage pressure is the right semantics. See Model Loading → KV cache reuse.
Validate tool-call arguments before dispatching. The arguments: Map<String, Any?> (Kotlin) / [String: Any?] (Swift) shape is unsafe by design — defensively coerce types and apply business-level invariants.
Match the model’s recommended sampling parameters. The LEAP bundle manifest (sampling_parameters under generation_time_parameters in each <Quant>.json on LiquidAI/LeapBundles) carries defaults tuned per checkpoint for the llama.cpp engine the SDK runs. Overriding temperature and friends often hurts quality more than it helps — start from the manifest values rather than the HF model card defaults (the two can differ).

Platform-specific concerns

iOS / macOS
Android
JVM / Linux native / Windows native

iOS deployment target: 17.0+ · macOS: 15.0+
Xcode 16.0+, Swift 6.0
Run model loads inside a Task from a @MainActor view model. ModelDownloader background downloads (via URLSessionConfiguration.background(withIdentifier:)) survive app suspension; see Model Loading.
The voice widget exists on UIKit and AppKit — see Voice Assistant Widget.

Min SDK 31 (Android 12).
Use a real device for testing — the emulator may crash loading model bundles.
LeapModelDownloader (the Android one) requires POST_NOTIFICATIONS at runtime on Android 13+ and a few manifest entries — see Quick Start → Install the SDK.
Background downloads use WorkManager + a foreground service; the SDK ships notification configuration via LeapModelDownloaderNotificationConfig.
For most cases, hold the runner in a ViewModel with viewModelScope. Unload via runBlocking(Dispatchers.IO) { runner.unload() } in onCleared().

JVM: JDK 11+. No Context parameter, no foreground service, no notifications — LeapDownloader is a simple async fetcher with a configurable saveDir.
Linux native runtime: glibc 2.34+ (Ubuntu 22.04, Debian 12, RHEL 9 or newer). Older hosts fail at process start.
Windows native: Windows 10+. DLLs co-locate next to the .exe (Windows’ standard search order finds them).
Pin to 0.10.5+ for Kotlin/Native — earlier 0.10.x releases have unresolved cinterop / linker issues that prevent producing a working executable. See Desktop & Native Platforms.

Troubleshooting

Symptom	Likely cause / fix
`LeapModelLoadingException` / `LeapError.modelLoadingFailure`	Missing companion file for multimodal model (mmproj / audio decoder). Verify the manifest or pass explicit paths via `loadSimpleModel(ModelSource(...))`.
Model loads but generates gibberish	Wrong sampling parameters or wrong function-call parser for the model family. Check the model card; default to `LFMFunctionCallParser` for LFM models, `HermesFunctionCallParser` for Qwen3/Hermes.
”ZIP archive corrupted” on download	Network hiccup mid-download. `LeapDownloader` / `LeapModelDownloader` validates SHA-256, so a partial file fails the check. Remove the cache directory and retry.
Generation hangs after `cancel()`	Cancellation is cooperative — the engine checks between tokens. There’s at most one extra token of slack. If it’s longer, you may be missing a `Job` cancel or the stream is being awaited on a thread other than the one you’re cancelling from.
Voice widget records silence	Missing microphone permission, or `AVAudioSession`/Android audio config not set to `playAndRecord` / mono / 16 kHz. See Voice Assistant Widget.
K/N executable fails at start with `dlsym@GLIBC_2.34`	Runtime host’s glibc is older than 2.34. Upgrade to Ubuntu 22.04+, Debian 12+, RHEL 9+, or build for an older runtime target.
Compile error “@Guide annotation missing” (Swift)	All properties on a `@Generatable` `struct` need a `@Guide`. Annotate every stored property.

For deeper failure-mode coverage, see Utilities → Errors.

Getting Started

On-Device

GPU Inference

Cloud inference

Tools

Architecture

The generation loop

Multi-turn with tool calls

Multimodal inputs

Complete view-model example

Pitfalls and best practices

Platform-specific concerns

Troubleshooting

Getting Started

On-Device

GPU Inference

Cloud inference

Tools

Documentation Index

​Architecture

​The generation loop

​Multi-turn with tool calls

​Multimodal inputs

​Complete view-model example

​Pitfalls and best practices

​Platform-specific concerns

​Troubleshooting

Architecture

The generation loop

Multi-turn with tool calls

Multimodal inputs

Complete view-model example

Pitfalls and best practices

Platform-specific concerns

Troubleshooting