Architecture
ModelRunner owns native memory; the Conversation holds chat history; the MessageResponse stream delivers incremental output. Same shape on every platform.
| Concern | Where it lives |
|---|---|
| Install & set up the dependency | Quick Start |
| Load the model | Model Loading |
| Drive the streaming loop | Conversation & Generation |
| Define and dispatch tools | Function Calling |
| Force structured JSON output | Constrained Generation |
| Voice UX | Voice Assistant Widget |
| Hybrid on-device + cloud | OpenAI-Compatible Client |
| Desktop & native targets | Desktop & Native Platforms |
The generation loop
Every agent has the same shape: send aChatMessage, iterate the response stream, dispatch each variant. Use the language’s exhaustive switch — onEnum(of:) (Swift) or is checks against the sealed interface (Kotlin) — so the compiler errors if a new MessageResponse case is added.
- Swift (iOS / macOS)
- Kotlin (all platforms)
Multi-turn with tool calls
The defining feature of an agent: the model emitsFunctionCalls, you execute the tool, append the result as a tool-role message, and continue. The same pattern works on every platform.
- Swift (iOS / macOS)
- Kotlin (all platforms)
runtimeDispatch(_:) as your tool-call → result router: validate arguments, call the underlying implementation, JSON-encode the result. Register the corresponding LeapFunction definitions on the conversation before you start the loop — see Function Calling.
Multimodal inputs
Multimodality is model-specific. Most multimodal models ship as text + one other modality (vision OR audio), not both. Send image parts (Swift
ChatMessageContent.fromJPEGData(_:) / Kotlin ImageUtils.fromBitmap(...)) only to a vision-capable model, and audio parts (Swift ChatMessageContent.fromWAVData(_:) / Kotlin ChatMessageContent.Audio(...)) only to an audio-capable model. Verify on the model’s Hugging Face card before wiring up the input.- Swift (iOS / macOS)
- Kotlin (all platforms)
Complete view-model example
AChatViewModel that loads the model, registers a tool, drives generation, and exposes streaming text to the UI.
- Swift (iOS / macOS)
- Kotlin (Android)
Pitfalls and best practices
- Always handle every
MessageResponsecase. Even if you only care about.chunkand.complete, give.functionCalls,.audioSample, and.reasoningChunkexplicit (empty) branches — otherwise an exhaustive switch will fail to compile when a new variant is added. - Cancel before re-issuing. Don’t start a second
generateResponse(...)while one is in flight. Either cancel the previousTask/Job, or checkconversation.isGeneratingfirst. - Don’t
runBlockingin production paths. It’s fine inonCleared()for guaranteed cleanup (becauseviewModelScopeis already cancelled at that point). Anywhere else, it freezes the calling thread. - Use
cacheDir(Android) /cachesDirectory(iOS) for KV-cache reuse paths. They’re regenerable — letting the OS reclaim them on storage pressure is the right semantics. See Model Loading → KV cache reuse. - Validate tool-call arguments before dispatching. The
arguments: Map<String, Any?>(Kotlin) /[String: Any?](Swift) shape is unsafe by design — defensively coerce types and apply business-level invariants. - Match the model’s recommended sampling parameters. The LEAP bundle manifest (
sampling_parametersundergeneration_time_parametersin each<Quant>.jsonon LiquidAI/LeapBundles) carries defaults tuned per checkpoint for the llama.cpp engine the SDK runs. Overridingtemperatureand friends often hurts quality more than it helps — start from the manifest values rather than the HF model card defaults (the two can differ).
Platform-specific concerns
- iOS / macOS
- Android
- JVM / Linux native / Windows native
- iOS deployment target: 17.0+ · macOS: 15.0+
- Xcode 16.0+, Swift 6.0
- Run model loads inside a
Taskfrom a@MainActorview model.ModelDownloaderbackground downloads (viaURLSessionConfiguration.background(withIdentifier:)) survive app suspension; see Model Loading. - The voice widget exists on UIKit and AppKit — see Voice Assistant Widget.
Troubleshooting
| Symptom | Likely cause / fix |
|---|---|
LeapModelLoadingException / LeapError.modelLoadingFailure | Missing companion file for multimodal model (mmproj / audio decoder). Verify the manifest or pass explicit paths via loadSimpleModel(ModelSource(...)). |
| Model loads but generates gibberish | Wrong sampling parameters or wrong function-call parser for the model family. Check the model card; default to LFMFunctionCallParser for LFM models, HermesFunctionCallParser for Qwen3/Hermes. |
| ”ZIP archive corrupted” on download | Network hiccup mid-download. LeapDownloader / LeapModelDownloader validates SHA-256, so a partial file fails the check. Remove the cache directory and retry. |
Generation hangs after cancel() | Cancellation is cooperative — the engine checks between tokens. There’s at most one extra token of slack. If it’s longer, you may be missing a Job cancel or the stream is being awaited on a thread other than the one you’re cancelling from. |
| Voice widget records silence | Missing microphone permission, or AVAudioSession/Android audio config not set to playAndRecord / mono / 16 kHz. See Voice Assistant Widget. |
K/N executable fails at start with dlsym@GLIBC_2.34 | Runtime host’s glibc is older than 2.34. Upgrade to Ubuntu 22.04+, Debian 12+, RHEL 9+, or build for an older runtime target. |
| Compile error “@Guide annotation missing” (Swift) | All properties on a @Generatable struct need a @Guide. Annotate every stored property. |