Overview - Liquid Docs

The Leap SDK is Liquid AI’s official on-device inference SDK and the only SDK with first-class support for Liquid Foundation Models (LFMs) — LFM2, LFM2.5 (text, thinking, JP, VL), and LFM2.5-Audio. “First-class” means every published Liquid checkpoint is supported, validated, and shipped through this SDK on day-one — the same team that trains the models ships the engine, sampler defaults, chat templates, and tool-call parsers that run them. There is no separate adapter layer, no community port, no upstream-rebase lag. It’s also a Kotlin Multiplatform library: the same ModelRunner / Conversation / MessageResponse API runs on iOS, macOS, Android, JVM desktop, Linux native, Windows native, and (preview) wasmJs. The Swift surface is generated through Kotlin/Native + SKIE and ships as XCFrameworks; the Android/JVM surface ships as Maven Central artifacts. Both call shapes are identical — only the language and packaging differ.

Ready to install?

Jump to the Quick Start — install via SPM or Gradle, load a model, stream a response.

What “first-class support for Liquid models” gets you

Day-one model coverage. New LFM checkpoints land in the SDK release that announces them — no waiting for a generic runtime to catch up to a new architecture, no manual quant conversion, no template-mismatch debugging. The LEAP Model Library is the canonical distribution path and the SDK pulls directly from it.
Per-checkpoint validated defaults. The sampling parameters baked into each model’s bundle manifest (sampling_parameters under generation_time_parameters in each <Quant>.json on LiquidAI/LeapBundles) are the values the training team validated for that exact checkpoint. The SDK applies them automatically — no temperature=0.7 placeholder retuning, no token-stream artifacts from the wrong min_p / repetition_penalty.
LFM-native special tokens and chat templates. The shipped engine knows how to filter LFM control tokens before they reach your stream, applies the right chat template per checkpoint, and parses LFM’s hermes and pythonic function-call dialects out of the box. Generic SDKs treat these as opaque text and surface raw tokens; Leap surfaces typed MessageResponse.FunctionCalls with parsed argument maps.
Multimodal LFMs in one API. Vision (LFM2-VL family) and audio (LFM2.5-Audio) plug into the same ChatMessage / ChatMessageContent types you already use for text. Image inputs travel as JPEG bytes; audio travels as WAV blobs (or raw float32 PCM on Kotlin via AudioPcmF32). Output MessageResponse.AudioSample streams float32 PCM frames for audio-out checkpoints. No separate runtime per modality.
Constrained generation, end-to-end. Kotlin annotations (@Generatable / @Guide on @Serializable data classes) and Swift macros (@Generatable / @Guide synthesizing jsonSchema() at compile time) produce JSON Schemas the engine enforces at decode time. The model’s output is guaranteed to parse into your type.
One-call model fetching from the LEAP Model Library. LeapModelDownloader.loadModel(modelName:, quantizationType:) resolves a manifest, downloads the right GGUF + matching mmproj/audio-decoder companion files for the checkpoint, caches them on disk, and hands back a ModelRunner — one call, no manual path wiring, no companion-file detection. Background-safe on iOS (URLSessionConfiguration.background(withIdentifier:)), WorkManager-backed on Android (survives app restarts).

Other features

On-device by default. No cloud round-trip, no per-token cost, full privacy, full offline operation.
KV cache reuse for fast multi-turn. Bounded-LRU disk + memory CacheOptions skip the prefill step for shared prompt prefixes — TTFT on a long system prompt or RAG preamble drops from seconds to under a hundred milliseconds on cache hits. Disabled by default; opt in with LiquidCacheOptions.enabled(path:) / ModelLoadingOptions.cacheOptions(path = ...).
Memory-mapped weight loading. use_mmap=true is the default since v0.10.4. Model weights are file-backed, not anonymous RSS — iOS jetsam and Android LMK score the app much lower under memory pressure, cold load returns as soon as the file is mapped, and warm reloads stream from the kernel page cache.
Hybrid on-device + cloud routing. leap-openai-client ships in the same release as an opt-in OpenAI-compatible chat-completions client (OpenAI, OpenRouter, vLLM, llama-server). One binary, two code paths — route small/fast prompts on-device, fall back to a cloud model for hard ones, share the same ChatMessage types.
Drop-in voice assistant UI. leap-ui ships a Compose Multiplatform voice widget — animated orb, mic button, status label, state machine — that pairs with VoiceConversation to wire LFM2.5-Audio into a working voice experience without writing the recording-and-playback plumbing yourself.

Where to go next

Quick Start — install and run your first generation.
Model Loading — manifest-based downloads, sideloaded GGUFs, ModelLoadingOptions reference, KV cache configuration.
Conversation & Generation — the streaming generation API.
Constrained Generation — @Generatable types and JSON-schema-enforced output.
Function Calling — tool definitions, function-call parsing, hermes / pythonic dialects.
Voice Assistant Widget — drop-in Compose Multiplatform voice UI.
OpenAI-Compatible Client — hybrid on-device + cloud routing.
Migrating from 0.9.x? — the unification story and drop-in replacements for legacy Leap.load(...) / LiquidEngine(...) call sites.

Ready to install?

​What “first-class support for Liquid models” gets you

​Other features

​Where to go next

What “first-class support for Liquid models” gets you

Other features

Where to go next