Latest release: v0.10.6 (GitHub). This page covers user-visible changes in the LEAP SDK across releases. For per-build commit detail, see the release notes onDocumentation Index
Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt
Use this file to discover all available pages before exploring further.
Liquid4All/leap-sdk.
0.9.x → 0.10.x: Kotlin Multiplatform unification
Starting with v0.10.0, the LEAP SDK ships from a single Kotlin Multiplatform codebase. The two previously separate distributions (the Android-onlyai.liquid.leap:* Maven artifacts and the iOS-only Liquid4All/leap-ios Swift package) were collapsed into one source tree that publishes to:
- Swift Package Manager —
Liquid4All/leap-sdk(new repo, for iOS / macOS consumers). - Maven Central —
ai.liquid.leap:*(Android, JVM, and Kotlin/Native targets).
Liquid4All/leap-ios repository is no longer the iOS source-of-truth. Existing 0.9.x Swift call sites (Leap.load(...), Conversation.generateResponse(...), etc.) keep compiling thanks to a Swift compatibility layer, but new code should adopt the unified APIs documented in the Quick Start.
Five SPM products / four Maven artifacts
The unified KMP package vends a richer surface than the 0.9.x distributions:| SPM product | Maven artifact | Purpose |
|---|---|---|
LeapSDK | ai.liquid.leap:leap-sdk | Core inference + conversation API |
LeapModelDownloader | ai.liquid.leap:leap-model-downloader | Hosted / manifest-based model fetch |
LeapOpenAIClient | ai.liquid.leap:leap-openai-client | OpenAI-compatible cloud chat client (new in 0.10.0) |
LeapUI | ai.liquid.leap:leap-ui | Voice assistant widget — Compose Multiplatform (new in 0.10.0) |
LeapSDKMacros | (Swift only) | @Generatable / @Guide constrained-generation macros |
Breaking changes for iOS consumers
- SPM URL change. Point your Swift Package Manager dependency at
https://github.com/Liquid4All/leap-sdk.git(not the deprecatedleap-iosrepo). - CocoaPods removed. The SDK ships exclusively through SPM in v0.10.0 onward.
- Toolchain bump. Xcode 16 and Swift 6.0 are required.
ModelDownloader→LeapModelDownloader. The downloader class was renamed; update call sites accordingly. See Model Loading for the 0.10.x constructor signature.
Major additions since 0.9.x
The features below were introduced in the 0.10.x line.OpenAI-compatible cloud client
LeapOpenAIClient / leap-openai-client is a small, dependency-light client for any OpenAI-compatible chat completions endpoint — OpenAI itself, OpenRouter, vLLM, llama-server, or your own proxy. It ships in the same package so you can route requests between an on-device LFM and a cloud model from a single app.
See OpenAI-Compatible Client.
Voice assistant widget
LeapUI / leap-ui is a Compose Multiplatform module that ships a drop-in voice assistant widget — an animated orb, mic button, and status label — backed by a state machine that handles recording, generation, and audio playback. Stable on iOS, macOS, Android, and JVM; Wasm/Web is present in the source tree as preview.
See Voice Assistant Widget.
Sideloading models from explicit paths
LeapDownloader.loadSimpleModel and LeapModelDownloader.loadSimpleModel load a model from explicit resource paths or URLs without going through the LEAP Model Library manifest. Useful for ADB-pushed bundles, app-bundled models, or any setup where you’ve already placed the model files on disk.
See Model Loading — Sideloaded files.
iOS background downloads
The iOS / macOS SwiftModelDownloader(sessionConfiguration:) initializer accepts an optional URLSessionConfiguration? so downloads can continue when the app is suspended or killed. See Model Loading → Constructing the downloader.
autoDetectCompanionFiles
Leap.load(url:options:) on iOS gained an autoDetectCompanionFiles: Bool = true parameter that picks up companion files sitting next to the model file (e.g. multimodal projection weights).
Swift ergonomics
- Compatibility layer keeps 0.9.x call sites (
Leap.load(...),Conversation.generateResponse(...)) compiling on top of the unified KMP surface. onEnum(of:)— SKIE-bridged sealed-class switching for Kotlin enums and sealed hierarchies (e.g.MessageResponse).ChatMessageContentstatic factories —.text(...),.image(...),.audio(...)helpers instead of constructor calls.- Builder-style options —
LiquidInferenceEngineOptions.with(cacheOptions:), etc.
Memory-mapped model loading by default
Starting in v0.10.4 and on by default through the current release, the inference engine loads model weights viammap (use_mmap=true). It’s the default behavior for every loaded model. On mobile this is the most user-visible runtime change in the 0.10.x line. A public opt-out arrived in v0.10.5 as ModelLoadingOptions.useMmap: Boolean? (Kotlin) / LiquidInferenceEngineOptions(useMmap:) (Swift) — leave it null/nil to keep the default, or set false for filesystems where mmap misbehaves (some Android scoped-storage paths, certain network mounts).
What changed. Previously the engine read(2)-ed the entire model file into a heap-allocated buffer before running prefill. Now it memory-maps the file: the kernel maps the on-disk weights into the process’s virtual address space and loads pages lazily as they’re accessed.
Performance implications on mobile:
- Lower private RSS. mmap’d weights are file-backed pages, not “anonymous private” RSS. iOS’s jetsam and Android’s low-memory killer both score apps primarily by anonymous RSS, so a 1.2B-Q4 model that previously counted as ~700 MB of dirty heap now shows as backing pages the OS may evict for free. Foreground apps are significantly less likely to be terminated under memory pressure.
- Faster cold load. The constructor returns as soon as the file is mapped — typically tens of milliseconds — instead of waiting for the entire model to be read into RAM. The first inference pays the page-in cost incrementally as the engine touches weights.
- Faster warm reloads. After the first load, the kernel’s page cache holds the model’s hot pages. Re-creating a runner on the same model (e.g. after a background termination and relaunch within the same boot) is near-instant — pages stream from the page cache, not disk.
- Multi-model sharing. Two processes (or two runners in one process) loading the same model file share physical pages via the page cache, with no extra RAM cost.
- Graceful memory pressure. When the OS needs RAM for the foreground UI or another app, it can drop read-only model pages without writing them anywhere (they’re backed by the file). The next access re-pages them in. With anonymous heap buffers, the kernel had to choose between swapping (slow) or killing the app (worse).
- First-token latency on cold pages. The first generation against a freshly-mapped model triggers page faults as the engine walks the weights. This adds disk-I/O latency to TTFT on the first call after process start. The KV cache reuse documented above compounds well here: cached prefixes skip both prefill compute and the page-fault cost for weights touched during prefill.
- Storage type matters. On devices with slow eMMC / external SD storage, lazy page-in can be noticeably slower than the old eager-read flow that loaded the whole file once. Internal flash on every shipped iOS device and any modern Android device is fast enough that this isn’t visible in practice.
- Opt-out available since v0.10.5. Pass
useMmap = falseonModelLoadingOptions(Kotlin) orLiquidInferenceEngineOptions(useMmap: false)(Swift) to force the legacy full-read loader. Use only whenmmapmisbehaves on the target filesystem; the default ofnull/nilkeeps the engine default.
v26.02.1-79-ge5f65988dc). The default has stayed on through every subsequent pin cascade.
KV cache reuse across generations
CacheOptions (new in v0.10.4, ergonomic Swift surface in v0.10.4.3) lets the engine persist KV-cache data between generateResponse calls so requests that share a prompt prefix skip the prefill work for the shared tokens.
Why it matters. Transformer inference has two phases: prefill (compute keys and values for every prompt token) and decode (generate one new token at a time, reusing those K/V vectors). On mobile, prefill dominates time-to-first-token for any prompt longer than a few hundred tokens. With CacheOptions enabled, a previously seen prefix is read from disk instead of recomputed — TTFT can drop from seconds to under a hundred milliseconds on cache hits. Per-token decode cost is unchanged.
When it speeds things up. Anywhere the same tokens appear at the start of many requests:
- Multi-turn chat with a long system prompt. Every turn reuses the system prompt and earlier turns.
- RAG / retrieval-augmented generation. Many queries share the retrieved-document preamble.
- Few-shot prompting. A fixed set of examples precedes every request.
- Agent loops. Tool definitions, role instructions, and task scaffold are stable across iterations.
- Voice assistant continuations. Conversation history grows; everything before the latest user turn is cacheable.
Per-release notes
v0.10.6 — 2026-05-12
iOSModelDownloader (the Swift class formerly known as LeapModelDownloader — see the rename note below) reaches parity with the cross-platform LeapDownloader. Callers no longer need to pair the two classes to download and load a model on Apple platforms — every entry point routes file transfer through URLSession and then hands off to the loader.
New iOS API on ModelDownloader:
loadModel(modelName:, quantizationType:, options:, generationTimeParameters:, forceDownload:, downloadProgress:)— downloads (when needed) and loads in one call. The transfer registers inqueryStatus, is cancellable viarequestStopDownload, and continues across backgrounding when constructed withsessionConfiguration: .backgroundSessionConfiguration(withIdentifier:).loadModel(manifestUrl:, options:, generationTimeParameters:, forceDownload:, downloadProgress:)— same flow keyed by a manifest URL.loadSimpleModel(model: ModelSource, options:, generationTimeParameters:, downloadProgress:)— sideload from explicit paths or URLs; HTTPS sources stream throughURLSession, local paths pass straight through.forceDownload: Bool = falseon all three load methods. Resolves the manifest first, then deletes the local cache, then re-downloads — a registry failure on resolve leaves the previously-working cached copy intact.- Resource-lookup helpers that previously lived only on the cross-platform
LeapDownloader:getModelResourceFolder(...),getCachedManifest(...),getCachedFilePath(...),resolve(...),deleteModelFile(...). requestDownloadModel(manifestUrl:, forceDownload:)overload for symmetry withremoveModel(manifestUrl:)/queryStatus(manifestUrl:).
- Class renamed:
LeapModelDownloader→ModelDownloaderon iOS / macOS. The Kotlin class still lives inai.liquid.leap.downloader.LeapModelDownloaderand Android consumers are unaffected; a@ObjCName(swiftName = "ModelDownloader")annotation gives the Swift export an unambiguous name distinct from the framework module’s ownLeapModelDownloadername. Update Swift call sites: - Parameter labels renamed across the iOS
ModelDownloadersurface —model:/quantization:→modelName:/quantizationType:on every method that already existed:downloadModel(...),requestDownloadModel(...),requestStopDownload(...),queryStatus(...),removeModel(...),getModelSize(...). Every loader now uses the same labels across Swift and Kotlin —ModelDownloader(iOS, macOS),LeapModelDownloader(Android), andLeapDownloader(cross-platform) all sharemodelName:/quantizationType:. LeapModelDownloaderSPM library product is now single-target. It no longer bundles theLeapSDKtarget. Apps depending on this product must drop any directLeapSDKSPM dependency from the same target —import LeapModelDownloaderre-exports every LeapSDK Kotlin type (Conversation,ModelRunner,ChatMessage,Leap, the convenience extensions, …). Keeping both library products on the same target double-bundles the inference engine dylibs and triggers a build-time#errorfrom the LMD umbrella header (see “dual-import guard” below); theLeapUIlibrary product still bundlesLeapSDKbecause LeapUI does not re-emit those types in its ObjC binding.LeapModelDownloader.xcframeworkis now a dynamic framework. It was a static archive in 0.10.5. SPM applies Embed & Sign automatically; manual integrators must add the framework with “Embed & Sign” instead of “Do Not Embed”. The shipped XCFramework now also bundles the inference engine dylibs (libinference_engine.dylib,libinference_engine_llamacpp_backend.dylib,libie_zip.dylib) underFrameworks/with an@loader_path/FrameworksLC_RPATH — consumers using LMD on its own no longer needLeapSDK.framework/Frameworkson their search path.- Dual-import build-time guard. LMD’s umbrella header carries a
__has_include(<LeapSDK/LeapSDK.h>) && !defined(LEAP_DUAL_IMPORT_ALLOW)check that fires#errorat the consumer’s preprocessing time when bothLeapSDKandLeapModelDownloaderframeworks are reachable in the same target. To opt out for legitimate combinations (e.g. transitive linkage viaLeapUI), addLEAP_DUAL_IMPORT_ALLOW=1toOTHER_CFLAGS.
ModelDownloader(),ModelDownloader(sessionConfiguration:),ModelDownloader(config:)— Kotlin/Native ObjC export strips default-argument metadata, so 0.10.5 forced Swift callers to pass every parameter (andLeapDownloaderConfighas seven). These new SKIE-bundled convenience inits restore the parameterless / single-arg forms.LeapDownloaderConfig()parameterless convenience init mirroring the Kotlin defaults (saveDir = "leap_models",validateSha256 = true, etc.). Same rationale —LeapDownloaderConfigis a Kotlindata classwith seven defaulted fields that the ObjC export couldn’t carry through.
requestDownloadModel(forceDownload: false)now short-circuits when a cached manifest already exists and every resource referenced by that manifest is present on disk — matches both the Android downloader’s idempotent-call semantics and whatqueryStatus(...)already reports. Earlier 0.10.5 builds would short-circuit on the manifest alone, leaving the caller stuck if any resource file had been removed. PassforceDownload: trueto re-download on top of a cache.- Cached-file lookup uses Ktor URL parsing instead of substring slicing, so URLs with fragments or query strings now produce the same filename the loader expects (
getCachedFilePathwas previously brittle for those shapes).
getAvailableDiskSpace()previously returnednullon every Apple platform because the internalNSFileManager.attributesOfFileSystem(forPath:)[.systemFreeSize]extraction cast throughas? Long(Kotlin) which never matches the bridgedNSNumber. Now goes throughNSNumber.longLongValueand reports the real free-space figure.
- Options on the new load methods take
LiquidInferenceEngineManifestOptions?(the Swift-friendly type already used byLeap.load), withtoModelLoadingOptions()/toGenerationTimeParameters()conversion at the boundary — no separate KMP options class needed from Swift. - Internal: the iOS class uses
kotlin.experimental.ExperimentalObjCName(stable in our Kotlin 2.3.20 baseline but still formally experimental). - No public-API changes for Android or non-Apple Kotlin/Native targets.
v0.10.5 — 2026-05-11
Headline additions: Android Leap Model Service for cross-app model sharing, theuseMmap knob on ModelLoadingOptions, and a parameter-name cleanup on LeapDownloader.loadModel so it matches LeapModelDownloader.loadModel.
Breaking changes (Kotlin):
ModelLoadingOptions.cacheDir: String?→cacheOptions: EngineOptions.CacheOptions?— KV cache configuration moves to a bounded-LRUCacheOptionsvalue with explicitenabledmaster switch, per-tier caps (maxEntriesDisk,maxEntriesMemory,maxBytesMemory), and optionaldiskDisabled = truefor memory-only mode. Migrate via theModelLoadingOptions.cacheOptions(path = ...)factory (preserves the historical 40-entry disk budget and setsenabled = true). Constructing a rawCacheOptionsrequiresenabled = trueto enable the cache — a positivemaxEntriesalone is no longer sufficient.LeapDownloader.loadModel(modelName, quantizationSlug, modelLoadingOptions, …)→loadModel(modelName, quantizationType, options, …)— parameter renames bringLeapDownloaderin line withLeapModelDownloader. The same rename applies toloadSimpleModel(model, modelLoadingOptions, …)→loadSimpleModel(model, options, …)andloadModelFromManifestUrl(…). Swift sites that calleddownloader.loadModel(modelName:, quantizationSlug:, modelLoadingOptions:)need to swap toquantizationType:/options:after upgrading.progressis now nullable (progress: ((ProgressData) -> Unit)? = null) — passnullto opt out (was an empty-lambda default).
ModelLoadingOptions.useMmap: Boolean? = null— exposes the engine’suse_mmaptoggle to Kotlin/Swift callers.null(default) defers to the engine default oftrue. Setfalseonly on filesystems wheremmapmisbehaves (some Android scoped-storage paths, certain network mounts). On Swift,LiquidInferenceEngineOptionsgained a matching.with(useMmap:)builder. Previously mmap could not be disabled from the SDK.- Leap Model Service (Android) —
leap-model-serviceis a new optional Android service that hosts loaded models in its own process and lets multiple client apps share them. Apps usingLeapModelDownloader.loadModel(...)route through the service transparently when it’s installed on the device; otherwise they fall back to in-process loading. Per-UID session quotas, persistent foreground notification, disk-backed KV cache reuse across cold starts, and AIDL-routedregisterFunction(s). PassforceLocal = trueonLeapModelDownloader.loadModel(...)to bypass the service for testing. See Model Loading for the routing model. - Service-side load progress — when routing through the model service,
LeapModelDownloader.loadModel’sprogresscallback now fires for service-side downloads too (was previously local-path-only).
- Apple
LeapModelDownloaderinternal slot names switched fromquantizationSlug→quantizationTypefor consistency. Public Swift label names (model:/quantization:) are unchanged. - Vendor
liquid.hheader refresh for Linux/MinGW K/N targets.
v0.10.4.5 — 2026-05-08
Engine ABI fix release. SPM consumers should bump to this version.- Engine pin advanced to
v26.02.1-146-g777faf0dbb— fixes a K/N + Linuxfree(): invalid pointerSIGABRT inliquid_string_destroy(the FFI helper was freeing the wrong pointer slot). - Linux runtime smoke test now asserts the engine reports failure on a missing model path, guarding against silent-success regressions.
NativeLibLoadercleanup: stdout warnings moved toSystem.err; loader stays kotlin-stdlib-only.
v0.10.4.4 — 2026-05-07
K/N link-time--allow-shlib-undefined fix for Linux consumers. No API changes.
v0.10.4.3 — 2026-05-07
iOS/macOS Swift convenience surface forcacheOptions:
LiquidInferenceEngineManifestOptions(cacheOptions: ..., contextSize: 4096)now accepts native Swift types (previously the convenience init droppedcacheOptionsand forced consumers to the verbose Obj-C designated init).- New
with(cacheOptions:)builders onLiquidInferenceEngineOptionsandLiquidInferenceEngineManifestOptions. - New
LiquidCacheOptions.enabled(path:)static factory — Swift analog ofModelLoadingOptions.cacheOptions(path:).
v0.10.4.1 — 2026-05-07
Vendor pin refresh — bumps the inference engine tov26.02.1-142-gb4aa080538. Adds Strategy B chain-prefix replay for cold/warm bit-determinism and generalizes the Android backend native loader to Linux and Windows desktop. No public API changes.
v0.10.4 — 2026-05-06
- Bounded-LRU
CacheOptionsAPI across JVM, Android, Kotlin/Native, Apple, and wasmJs. use_mmap=trueis now the engine default (via vendored IE pinv26.02.1-79+). Model weights are memory-mapped instead ofread(2)-ed into a heap buffer. See Memory-mapped model loading by default above for the mobile performance impact.- K/N Linux link fix (
--allow-shlib-undefinedforlibinference_engine.soagainst modern glibc). - Dynamic vendor pipeline +
DT_NEEDED-based shipped-libs verify;inference_engineRUNPATH=$ORIGINcascade for Linux/Windows shared vendor libs. NativeLibLoadercross-platform load fixes (resource extraction + Windows pre-load topo-retry).- Three release-gate smokes (Linux K/N, Apple SwiftPM consumer, Windows JVM) wired into CI.
v0.10.1 — 2026-04-29
Additive fix release for Linux/MinGW Kotlin/Native consumers. Apple/SPM consumers see no API or behavior changes vs v0.10.0.leap-sdkLinux/MinGW K/N artifacts on Maven Central now publish a-natives.zipclassifier containing the runtime.so/.dlllibraries.- New
ai.liquid.leap.nativelibsGradle plugin auto-wires the natives ZIP into consumer K/N executables. leap-openai-clientnow publishes Linux/MinGW K/N klibs.