If youβve used a cloud chat-completion API (OpenAI, Anthropic, etc.), most of LEAPβs shape will be familiar β async streaming, role-tagged messages, JSON-serializable history. The biggest difference: you load the model explicitly, locally, before generation, instead of pointing a client at a remote endpoint. This page maps the OpenAI Python clientβs flow onto the LEAP SDK across Swift, Kotlin (Android), and Kotlin (JVM / native). For OpenAI compatibility on the client side, also see OpenAI-Compatible Client.Documentation Index
Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt
Use this file to discover all available pages before exploring further.
Reference: an OpenAI streaming call
1. Load the model (vs. construct a client)
Cloud APIs create a thin client that points at a remote endpoint. LEAP downloads the model the first time and loads it into aModelRunner β typically a few seconds depending on model size and device.
- OpenAI (Python)
- Swift (iOS / macOS)
- Kotlin (Android)
- Kotlin (JVM / native)
ModelRunner plays the same role as the cloud APIβs client object β except it carries the model weights. Release it and youβll have to load again before generating.
2. Request generation
The cloud API takes amessages array and returns a stream. LEAP attaches messages to a Conversation (so history is tracked automatically) and returns an async stream from generateResponse(...).
- OpenAI (Python)
- Swift (iOS / macOS)
- Kotlin (all platforms)
Conversation is already bound to the runner that loaded it.
3. Consume the stream
Cloud APIs deliver deltas; you concatenate them. LEAP deliversMessageResponse values; each variant maps to a UI update, audio frame, tool call, or completion marker.
- OpenAI (Python)
- Swift (iOS / macOS)
- Kotlin (all platforms)
4. Async context
Both LEAP and the OpenAI Python streaming client run inside an async context. The SDKβs call shape mirrors the languageβs idiomatic concurrency primitives.- Swift (iOS / macOS)
- Kotlin (Android)
- Kotlin (JVM / native)
Wrap calls in a
Task. SwiftUIβs .task modifier on a view is the most common entry. @MainActor view models keep model state on the main thread; the for try await loop suspends the task until the next chunk arrives.Whatβs the same
| Concept | OpenAI | LEAP |
|---|---|---|
| Role-tagged messages | {"role": "user", "content": "..."} | ChatMessage(role: .user, content: [.text("...")]) |
| Streaming responses | stream=True iterator | AsyncThrowingStream (Swift) / Flow (Kotlin) |
| Function calling | Tool definitions + tool_calls field | registerFunction(LeapFunction) + MessageResponse.functionCalls |
| Structured output | response_format = json_schema | GenerationOptions.setResponseFormat(type:) |
| Token usage stats | usage object on completion | Complete.stats (promptTokens, completionTokens, tokenPerSecond) |
Whatβs different
- No remote endpoint. You ship the model with the app (or download it the first time it runs). Latency is bounded by device CPU/GPU, not network round-trips.
- Explicit lifecycle. Hold a
ModelRunnerreference;unload()when done. Cloud clients never load anything explicitly. - Multimodal inputs go in
contentarray, same as OpenAI. Image and audio parts use the same OpenAIimage_url/input_audiowire format. - Companion files for multimodal models. Vision and audio-capable models need an
mmproj(vision) and/or audio decoder/tokenizer co-located on disk. Manifest-based loading handles this automatically;loadSimpleModelaccepts explicitmmprojPath/audioDecoderPath/audioTokenizerPath.
Next steps
- Quick Start β full setup for your platform.
- OpenAI-Compatible Client β the
LeapOpenAIClientlets you point an OpenAI-style client at any OpenAI-compatible endpoint. - Conversation & Generation β full streaming API reference.