The Leap SDK is Liquid AI’s official on-device inference SDK and the only SDK with first-class support for Liquid Foundation Models (LFMs) — LFM2, LFM2.5 (text, thinking, JP, VL), and LFM2.5-Audio. “First-class” means every published Liquid checkpoint is supported, validated, and shipped through this SDK on day-one — the same team that trains the models ships the engine, sampler defaults, chat templates, and tool-call parsers that run them. There is no separate adapter layer, no community port, no upstream-rebase lag. It’s also a Kotlin Multiplatform library: the sameDocumentation Index
Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt
Use this file to discover all available pages before exploring further.
ModelRunner / Conversation / MessageResponse API runs on iOS, macOS, Android, JVM desktop, Linux native, Windows native, and (preview) wasmJs. The Swift surface is generated through Kotlin/Native + SKIE and ships as XCFrameworks; the Android/JVM surface ships as Maven Central artifacts. Both call shapes are identical — only the language and packaging differ.
Ready to install?
Jump to the Quick Start — install via SPM or Gradle, load a model, stream a response.
What “first-class support for Liquid models” gets you
- Day-one model coverage. New LFM checkpoints land in the SDK release that announces them — no waiting for a generic runtime to catch up to a new architecture, no manual quant conversion, no template-mismatch debugging. The LEAP Model Library is the canonical distribution path and the SDK pulls directly from it.
- Per-checkpoint validated defaults. The sampling parameters baked into each model’s bundle manifest (
sampling_parametersundergeneration_time_parametersin each<Quant>.jsonon LiquidAI/LeapBundles) are the values the training team validated for that exact checkpoint. The SDK applies them automatically — notemperature=0.7placeholder retuning, no token-stream artifacts from the wrongmin_p/repetition_penalty. - LFM-native special tokens and chat templates. The shipped engine knows how to filter LFM control tokens before they reach your stream, applies the right chat template per checkpoint, and parses LFM’s hermes and pythonic function-call dialects out of the box. Generic SDKs treat these as opaque text and surface raw tokens; Leap surfaces typed
MessageResponse.FunctionCallswith parsed argument maps. - Multimodal LFMs in one API. Vision (LFM2-VL family) and audio (LFM2.5-Audio) plug into the same
ChatMessage/ChatMessageContenttypes you already use for text. Image inputs travel as JPEG bytes; audio travels as WAV blobs (or raw float32 PCM on Kotlin viaAudioPcmF32). OutputMessageResponse.AudioSamplestreams float32 PCM frames for audio-out checkpoints. No separate runtime per modality. - Constrained generation, end-to-end. Kotlin annotations (
@Generatable/@Guideon@Serializabledata classes) and Swift macros (@Generatable/@GuidesynthesizingjsonSchema()at compile time) produce JSON Schemas the engine enforces at decode time. The model’s output is guaranteed to parse into your type. - One-call model fetching from the LEAP Model Library.
LeapModelDownloader.loadModel(modelName:, quantizationType:)resolves a manifest, downloads the right GGUF + matchingmmproj/audio-decoder companion files for the checkpoint, caches them on disk, and hands back aModelRunner— one call, no manual path wiring, no companion-file detection. Background-safe on iOS (URLSessionConfiguration.background(withIdentifier:)), WorkManager-backed on Android (survives app restarts).
Other features
- On-device by default. No cloud round-trip, no per-token cost, full privacy, full offline operation.
- KV cache reuse for fast multi-turn. Bounded-LRU disk + memory
CacheOptionsskip the prefill step for shared prompt prefixes — TTFT on a long system prompt or RAG preamble drops from seconds to under a hundred milliseconds on cache hits. Disabled by default; opt in withLiquidCacheOptions.enabled(path:)/ModelLoadingOptions.cacheOptions(path = ...). - Memory-mapped weight loading.
use_mmap=trueis the default since v0.10.4. Model weights are file-backed, not anonymous RSS — iOS jetsam and Android LMK score the app much lower under memory pressure, cold load returns as soon as the file is mapped, and warm reloads stream from the kernel page cache. - Hybrid on-device + cloud routing.
leap-openai-clientships in the same release as an opt-in OpenAI-compatible chat-completions client (OpenAI, OpenRouter, vLLM, llama-server). One binary, two code paths — route small/fast prompts on-device, fall back to a cloud model for hard ones, share the sameChatMessagetypes. - Drop-in voice assistant UI.
leap-uiships a Compose Multiplatform voice widget — animated orb, mic button, status label, state machine — that pairs withVoiceConversationto wire LFM2.5-Audio into a working voice experience without writing the recording-and-playback plumbing yourself.
Where to go next
- Quick Start — install and run your first generation.
- Model Loading — manifest-based downloads, sideloaded GGUFs,
ModelLoadingOptionsreference, KV cache configuration. - Conversation & Generation — the streaming generation API.
- Constrained Generation —
@Generatabletypes and JSON-schema-enforced output. - Function Calling — tool definitions, function-call parsing, hermes / pythonic dialects.
- Voice Assistant Widget — drop-in Compose Multiplatform voice UI.
- OpenAI-Compatible Client — hybrid on-device + cloud routing.
- Migrating from 0.9.x? — the unification story and drop-in replacements for legacy
Leap.load(...)/LiquidEngine(...)call sites.