The LEAP SDK ships two downloader classes built on the same pipeline. They differ by what platform integration they add:Documentation Index
Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt
Use this file to discover all available pages before exploring further.
| Platform | Class | What it does |
|---|---|---|
| Android | LeapModelDownloader | One-shot loadModel(...) that routes through the optional Leap Model Service when installed, plus WorkManager-backed background download staging (requestDownloadModel / observeDownloadProgress) and foreground-service notifications. |
| iOS / macOS (Swift) | ModelDownloader | One-shot loadModel(...) and loadSimpleModel(...) that route every file transfer through URLSession. Pass sessionConfiguration: .background(withIdentifier:) for downloads that survive app suspension. Also exposes the underlying downloadModel / requestDownloadModel / queryStatus lifecycle for prefetch flows. The class ships in the LeapModelDownloader SPM library product. |
| All platforms (iOS, Android, JVM, Linux native, Windows native, macOS Kotlin) | LeapDownloader | The cross-platform manifest loader. One-shot loadModel(...) and loadSimpleModel(...). No platform-native background integration — the iOS ModelDownloader and Android LeapModelDownloader classes wrap one of these internally. |
ModelRunner and share an on-disk model cache when constructed with the same LeapDownloaderConfig.saveDir. The platform downloader wraps a LeapDownloader internally — once a download has landed, calling LeapDownloader.loadModel(...) against the shared cache picks up the files without re-downloading.
Parameter naming. Every loader uses the same parameter labels across Swift and Kotlin:
loadModel(...)/downloadModel(...)/requestDownloadModel(...)/queryStatus(...)/removeModel(...)all usemodelName:/quantizationType:on the SwiftModelDownloader(iOS, macOS), the KotlinLeapModelDownloader(Android), and the cross-platformLeapDownloader.ModelSource(sideloaded) usesquantizationId— the field is part of the source descriptor, not a loader parameter.
Swift class vs. SPM product name (v0.10.6+). In Swift code the class is
ModelDownloader; the SPM library product / framework module / import statement is LeapModelDownloader. In 0.10.5 both shared one name, which made the class effectively uninstantiable from Swift due to type-vs-module shadowing. The Kotlin class — and therefore Android consumers — still see LeapModelDownloader.Constructing the downloader
- Swift (iOS / macOS)
- Kotlin (Android)
- Kotlin (JVM / native)
ModelDownloader() and single-arg forms are Swift convenience inits added in v0.10.6 — Kotlin/Native’s ObjC export strips default-argument metadata, so without them Swift callers were forced to pass every parameter of the underlying seven-field LeapDownloaderConfig and a sessionConfiguration explicitly.Pass nil (default) for sessionConfiguration: to get foreground downloads. For background downloads that continue when the app is suspended or killed, pass URLSessionConfiguration.background(withIdentifier:):application(_:handleEventsForBackgroundURLSession:completionHandler:) to downloader.handleBackgroundEvents(completionHandler:) so the OS can wake your app when downloads finish.Cross-platform LeapDownloader (no background download support)
Cross-platform LeapDownloader (no background download support)
LeapDownloader is available on iOS too — same loadModel / loadSimpleModel API as Kotlin, but no URLSession background integration. Use it when you don’t need background downloads:Manifest-based loading
Resolves the GGUF manifest for the given model + quantization slug, downloads anything that isn’t already cached, then loads aModelRunner. Cached on subsequent calls.
- Swift (iOS / macOS)
- Kotlin (Android)
- Kotlin (JVM / native)
Use
ModelDownloader.loadModel(...) — the transfer runs through URLSession (so it inherits background-session support when configured) and the loader picks up the on-disk files without re-downloading.forceDownload— refresh the on-disk copy. The manifest is resolved first; only on a successful resolve are the local resources removed and re-downloaded, so a registry hiccup leaves the previously-working cached copy intact.downloadProgress— fraction (0…1) and bytes/sec for the transfer. The loader’s own corruption-retry fallback (a silent re-download when the engine rejects the on-disk files) does not surface to this callback.- Background transfers — construct with
ModelDownloader(sessionConfiguration: .background(withIdentifier:))so transfers continue when the app is suspended. See Constructing the downloader.
loadModel(manifestUrl:, ...) overload exists with the same shape if you’re loading from a manifest URL directly.Cross-platform LeapDownloader.loadModel
Cross-platform LeapDownloader.loadModel
LeapDownloader.loadModel(...) is the cross-platform manifest loader. On iOS it works the same way ModelDownloader.loadModel(...) does, minus the URLSession-backed background-transfer support. Use it when you’re building cross-platform Swift/Kotlin code or don’t need background downloads. Note that LeapDownloader is reachable through import LeapModelDownloader — there’s no need for a separate import LeapSDK (and the dual-import build-time guard will flag it if you add one).Legacy: Leap.load(model:quantization:options:)
Legacy: Leap.load(model:quantization:options:)
The 0.9.x-style New code should prefer
Leap.load(...) compatibility surface still works and wraps LeapDownloader.loadModel internally:ModelDownloader.loadModel(...) for app integrations, or LeapDownloader.loadModel(...) for cross-platform code.Sideloaded files
Use this path when you ship the model as an app asset,adb push it for development, download it via your own pipeline, or stage a multimodal model with its companion files in a known directory — anything that doesn’t go through the LEAP manifest registry.
- Swift (iOS / macOS)
- Kotlin (all platforms)
ModelSource path accepts an absolute filesystem path, a file:// URL, or an http(s):// URL (fetched and cached on first use through URLSession, so HTTPS sources inherit the same background-session support as downloadModel).Legacy: Leap.load(url:options:)
Legacy: Leap.load(url:options:)
The 0.9.x-style URL-based loader still works:Auto-detection picks up sibling
mmproj-*.gguf (vision) and audio decoder files (.gguf/.bin whose name contains “audio” and “decoder”). New code should prefer loadSimpleModel(model: ModelSource(...)) for race-free, explicit wiring.Fetch without loading
Useful for onboarding flows that prefetch over Wi-Fi or staging models you’ll load later. A subsequentloadModel(...) call with the same identifiers picks up the cached files without re-downloading.
- Swift (iOS / macOS)
- Kotlin (Android)
- Kotlin (JVM / native)
Runtime options
LiquidInferenceEngineOptions / ModelLoadingOptions
Per-load runtime overrides. Default values come from the model bundle’s manifest.
- Swift (iOS / macOS)
- Kotlin (all platforms)
LiquidInferenceEngineManifestOptions to ModelDownloader.loadModel(modelName:, quantizationType:, options:, ...) for manifest-based loads, and LiquidInferenceEngineOptions to Leap.load(url:, options:) for sideloaded GGUFs:.with(...) on GenerationOptions, LiquidInferenceEngineOptions, or LiquidInferenceEngineManifestOptions:cpuThreads— CPU thread count for token generation. Kotlin defaults toCpuThreadAdvisor.getRecommendedThreadCount(); Swift defaults to engine pick whennil.contextSize— override the maximum context length. Kotlin defaults to 8192; Swift defaults to model’s recommendation whennil.useMmap— tristateBoolean?.null(default) defers to the engine default oftrue. Set tofalseto force full-read loading on filesystems wheremmapmisbehaves (some Android scoped-storage paths, certain network mounts). Added in v0.10.5.nGpuLayers(Swift) — number of transformer blocks to offload to GPU (macOS Metal).-1offloads everything.audioDecoderUseGpu(Swift) — opt the audio decoder onto the Metal backend.randomSeed(Kotlin) — reproducible sampling seed.cacheOptions— KV cache reuse (see next section). On Kotlin this is anEngineOptions.CacheOptionsvalue with explicitenabledmaster switch (replaces the v0.10.4cacheDir: String?).mmProjPath/audioDecoderPath/audioTokenizerPath(Swift) — companion file overrides. Leavenilto auto-detect siblings of the GGUF file. On Kotlin these are passed viaModelSource.chatTemplate— advanced override for backend chat templating.extras— backend-specific configuration payload (JSON string).
Companion files. GGUF checkpoints look for sibling vision (
mmproj) and audio (decoder / tokenizer) files unless you override the paths. Co-locate them next to the model file or pass explicit paths via ModelSource for vision and audio features.GenerationTimeParameters & SamplingParameters (Kotlin)
Optional per-load overrides for the manifest’s recommended generation defaults.
KV cache reuse
EngineOptions.CacheOptions (Kotlin) / LiquidCacheOptions (Swift) tells the engine to persist KV-cache data between generations so requests sharing a prompt prefix can skip the prefill work for the shared tokens. Added in v0.10.4; Swift convenience surface in v0.10.4.3; per-tier bounded-LRU caps stabilized in v0.10.5.
How it works
Transformer inference has two phases:- Prefill — the model runs the full prompt through every layer and stores the attention keys and values (the “KV cache”) for each prompt token.
O(prompt_length). Dominates time-to-first-token (TTFT) for prompts longer than a few hundred tokens on-device. - Decode — each new output token only attends back to the cached K/V vectors.
O(1)per token in prompt length.
When it helps
| Use case | What’s reused |
|---|---|
| Multi-turn chat with a long system prompt | System prompt + earlier turns |
| RAG (retrieval-augmented generation) | The retrieved document context preceding the user question |
| Few-shot prompting | The fixed example set preceding each new query |
| Agent loops | Tool definitions, role instructions, task scaffold |
| Voice assistant continuations | Everything before the latest user turn |
| Streaming UI with quick edits | The unchanged prefix when a user edits the tail of a prompt |
Configuration
- Swift (iOS / macOS)
- Kotlin (Android)
- Kotlin (JVM / native)
LiquidInferenceEngineManifestOptions (manifest loads) and LiquidInferenceEngineOptions (sideloaded loads) both expose with(cacheOptions:) builders for chaining onto an existing options value.Use the app’s cachesDirectory (not documentDirectory) so iOS may reclaim space under storage pressure.Bounded-LRU caps
TheCacheOptions value exposed in v0.10.5 has six fields plus a diskDisabled flag for memory-only mode:
enabled = true: maxEntriesDisk if > 0, else maxEntries (legacy alias), else the engine default of 4096. Memory-tier defaults (256 entries / 512 MiB) apply unless you override them. The ModelLoadingOptions.cacheOptions(path = ...) factory preserves the historical 40-entry disk budget for callers migrating from cacheDir.
Notes and caveats
- Per-model. A cache directory is tied to the model bundle that wrote it. Don’t share one directory across different model checkpoints.
- Prefix-keyed. Reuse is based on the leading tokens of the prompt. Changing the system prompt, sampling parameters that alter prompt formatting, or tool definitions invalidates the cache for that branch.
- Cross-launch. Cached entries survive process restarts. Delete the directory to reset.
- First call. The first request for a given prefix sees no speedup — it’s the call that writes the entry. Subsequent calls hit the cache.
- Memory-only mode. Pass
EngineOptions.CacheOptions(path = ..., enabled = true, diskDisabled = true)to skip the disk tier entirely — useful for benchmarking or callers that don’t need cross-restart persistence. - wasmJs caveat. The WASM bridge currently drops the entire
cache_optionsblock; a one-shot warning is logged whenenabled = trueis set on wasmJs. Native (Apple, Linux, MinGW), JVM, and Android propagate all fields end-to-end. - Swift backwards compat. Prior to v0.10.4.3 the
cacheOptionsparameter was only reachable through the verbose Obj-C designated init withKotlinUInt(unsignedInt:)wrapping. New code should use.enabled(path:)and thewith(...)builders.
Leap Model Service (Android)
leap-model-service is an optional, separately-installable Android service that hosts loaded LEAP models in its own process and lets multiple client apps share them. Added in v0.10.5.
When the service is installed on a device, LeapModelDownloader.loadModel(...) from any client app routes through it transparently — the model is downloaded once, loaded once, and re-used across apps. When the service is not installed, LeapModelDownloader.loadModel(...) falls back to in-process loading. Client apps need zero code changes.
What you get
- Cross-app model sharing. Multiple apps that load the same model + quantization share one in-memory copy.
- Persistent foreground notification with live state (“Loading model…”, “Generating… N active”, “Ready — N models loaded”).
- Per-UID session quotas (max 3 sessions per client app, enforced by the service).
- Disk-backed KV cache reuse across cold starts — the service maintains its own KV cache directory, so prefill warmup persists across process restarts and across client apps.
- Service-side progress — when routing through the service,
LeapModelDownloader.loadModel(...)’sprogresscallback fires for service-side downloads too. Passingnull(the default) preserves the original deferred-load behavior (the model loads on first session creation rather than eagerly insideloadModel). - AIDL-routed function calling —
Conversation.registerFunction(...)andregisterFunctions(...)are forwarded to the service and applied on the shared session.
When to install the service
The service is distributed as a separate APK and is appropriate for:- Multi-app deployments where two or more LEAP-using apps run on the same device.
- System-image integrations where the device manufacturer or MDM pre-installs the service.
- Long-running background inference where the foreground-service notification is desirable.
LeapModelDownloader already does the right thing in-process.
Permissions
The service requires thePOST_NOTIFICATIONS runtime permission (Android 13+) to display its foreground notification. If the permission is missing, LeapServiceClient.connect() logs a warning and falls back to in-process loading. Direct the user to grant the permission via LeapServiceClient.isServiceNotificationPermissionGranted() + getOpenServiceAppIntent() — auto-launching another app from a library call would be too intrusive.
Notes
- The service ignores caller-supplied
cacheDirpaths (it maintains its own KV cache directory) — passcacheOptionsonModelLoadingOptionsto control the in-memory + disk caps, not the path. - First-load wins: when multiple apps request the same model simultaneously, the first call’s
ModelLoadingOptionsare applied; subsequent callers receive the shared runner regardless of their options. Read the effective config back viaLeapServiceClient.getLoadedModelConfig. - Models stay loaded until the service is shut down or restarted.
evictUnusedModelis a no-op by design — eviction would race with in-flight generations.
ProgressData / Manifest
Manifest yourself — downloadModel and loadModel populate and return it for you.