Quick Start Guide

Latest version: v0.9.2

Prerequisites

Make sure you have:

Xcode 15.0 or later with Swift 5.9.
An iOS project targeting iOS 15.0+ (macOS 12.0+ or Mac Catalyst 15.0+ are also supported).
A physical iPhone or iPad with at least 3 GB RAM for best performance. The simulator works for development but runs models much slower.

iOS Deployment Target: 15.0
macOS Deployment Target: 12.0

Always test on a real device before shipping. Simulator performance is not representative of production behaviour.

Install the SDK

Choose your preferred installation method:

Swift Package Manager
CocoaPods
Manual

Recommended

In Xcode choose File -> Add Package Dependencies.
Enter https://github.com/Liquid4All/leap-ios.git.
Select the 0.9.2 release (or newer).
Add the LeapSDK product to your app target.
(Optional) Add LeapModelDownloader if you plan to download model bundles at runtime.

Add the pod to your Podfile:

pod 'Leap-SDK', '~> 0.9.2'
# Optional: pod 'Leap-Model-Downloader', '~> 0.9.2'

Run pod install
Reopen the .xcworkspace.

Download LeapSDK.xcframework.zip (and optionally LeapModelDownloader.xcframework.zip) from the GitHub releases.
Unzip and drag the XCFramework(s) into Xcode.
Set the Embed setting to Embed & Sign for each framework.

The constrained-generation macros (@Generatable, @Guide) ship inside the LeapSDK product. No additional package is required.

Getting and Loading Models

The SDK uses GGUF manifests for loading models (recommended for all new projects due to superior inference performance and better default generation parameters).

Legacy Executorch bundle support is available in the accordion below for existing projects.

Loading from GGUF manifest

The LEAP Edge SDK supports directly downloading LEAP models in GGUF format. Given the model name and quantization method (which you can find in the LEAP Model Library), the SDK will automatically download the necessary GGUF files along with generation parameters for optimal performance.

import LeapSDK
import LeapModelDownloader
import Combine

@MainActor
final class ChatViewModel: ObservableObject {
    @Published var isLoading = false
    @Published var conversation: Conversation?
    private var modelRunner: ModelRunner?
    private var generationTask: Task<Void, Never>?

    func loadModel() async {
        isLoading = true
        defer { isLoading = false }
        do {
            // LEAP will download the model if needed or reuse a cached copy.
            let modelRunner = try await Leap.load(model: "LFM2-1.2B", quantization: "Q5_K_M", downloadProgressHandler: { progress, speed in
                // progress: Double (0...1)
                // speed: bytes per second
            })
            conversation = modelRunner.createConversation(systemPrompt: "You are a helpful travel assistant.")
            self.modelRunner = modelRunner
        } catch {
            print("Failed to load model: \(error)")
        }
    }

    func send(_ text: String) {
        guard let conversation else { return }
        generationTask?.cancel()
        let userMessage = ChatMessage(role: .user, content: [.text(text)])
        generationTask = Task { [weak self] in
            do {
                for try await response in conversation.generateResponse(
                    message: userMessage,
                    generationOptions: GenerationOptions(temperature: 0.7)
                ) {
                    self?.handle(response)
                }
            } catch {
                print("Generation failed: \(error)")
            }
        }
    }

    func stopGeneration() {
        generationTask?.cancel()
    }

    @MainActor
    private func handle(_ response: MessageResponse) {
        switch response {
        case .chunk(let delta):
            print(delta, terminator: "") // Update UI binding here
        case .reasoningChunk(let thought):
            print("Reasoning:", thought)
        case .audioSample(let samples, let sr):
            print("Received audio samples \(samples.count) at sample rate \(sr)")
        case .functionCall(let calls):
            print("Requested calls: \(calls)")
        case .complete(let completion):
            if let stats = completion.stats {
                print("Finished with \(stats.totalTokens) tokens")
            }
            let text = completion.message.content.compactMap { part -> String? in
                if case .text(let value) = part { return value }
                return nil
            }.joined()
            print("Final response:", text)
            // completion.message.content may also include `.audio` entries you can persist or replay
        }
    }
}

Legacy: Executorch Bundles

Browse the Leap Model Library and download a .bundle file for the model/quantization you want. .bundle packages contain metadata plus assets for the ExecuTorch backend.You can either:

Ship it with the app - drag the bundle into your Xcode project and ensure it is added to the main target.
Download at runtime - use LeapModelDownloader to fetch bundles on demand.

Alternative: Download at runtime

import LeapModelDownloader

let model = await LeapDownloadableModel.resolve(
  modelSlug: "lfm2-350m-enjp-mt",
  quantizationSlug: "lfm2-350m-enjp-mt-20250904-8da4w"
)
if let model {
  let downloader = ModelDownloader()
  downloader.requestDownloadModel(model)
  let status = await downloader.queryStatus(model)
  switch status {
  case .downloaded:
    let bundleURL = downloader.getModelFile(model)
    try await runModel(at: bundleURL)
  case .downloadInProgress(let progress):
    print("Progress: \(Int(progress * 100))%")
  case .notOnLocal:
    print("Waiting for download...")
  }
}

Use Leap.load(url:options:) inside an async context. Passing a .bundle loads the model through the ExecuTorch backend.

Example

import LeapSDK

@MainActor
final class ChatViewModel: ObservableObject {
  @Published var isLoading = false
  @Published var conversation: Conversation?
  private var modelRunner: ModelRunner?
  private var generationTask: Task<Void, Never>?

  func loadModel() async {
    guard let bundleURL = Bundle.main.url(forResource: "LFM2-350-ENJP-MT", withExtension: "bundle") else {
      assertionFailure("Model bundle missing")
      return
    }
    isLoading = true
    defer { isLoading = false }
    do {
      modelRunner = try await Leap.load(url: bundleURL)
      conversation = modelRunner?.createConversation(systemPrompt: "You are a helpful travel assistant.")
    } catch {
      print("Failed to load model: \(error)")
    }
  }

  func send(_ text: String) {
    guard let conversation else { return }
    generationTask?.cancel()
    let userMessage = ChatMessage(role: .user, content: [.text(text)])
    generationTask = Task { [weak self] in
      do {
        for try await response in conversation.generateResponse(
          message: userMessage,
          generationOptions: GenerationOptions(temperature: 0.7)
        ) {
          await self?.handle(response)
        }
      } catch {
        print("Generation failed: \(error)")
      }
    }
  }

  func stopGeneration() {
    generationTask?.cancel()
  }

  @MainActor
  private func handle(_ response: MessageResponse) {
    switch response {
    case .chunk(let delta):
      print(delta, terminator: "") // Update UI binding here
    case .reasoningChunk(let thought):
      print("Reasoning:", thought)
    case .audioSample(let samples, let sr):
      audioRenderer.enqueue(samples, sampleRate: sr)
    case .functionCall(let calls):
      print("Requested calls: \(calls)")
    case .complete(let completion):
      if let stats = completion.stats {
        print("Finished with \(stats.totalTokens) tokens")
      }
      let text = completion.message.content.compactMap { part -> String? in
        if case .text(let value) = part { return value }
        return nil
      }.joined()
      print("Final response:", text)
      // completion.message.content may also include `.audio` entries you can persist or replay
    }
  }
}

Need custom runtime settings (threads, context size, GPU layers)? Pass a LiquidInferenceEngineOptions value:

let options = LiquidInferenceEngineOptions(
  bundlePath: bundleURL.path,
  cpuThreads: 6,
  contextSize: 8192,
  nGpuLayers: 8
)
let runner = try await Leap.load(url: bundleURL, options: options)

Stream responses

send(_:) (shown above) launches a Task that consumes the AsyncThrowingStream returned by Conversation.generateResponse. Each MessageResponse case maps to UI updates, tool execution, or completion metadata. Cancel the task manually (for example via stopGeneration()) to interrupt generation early. You can also observe conversation.isGenerating to disable UI controls while a request is in flight.

Send images and audio (optional)

When the loaded model ships with multimodal weights (and companion files were detected), you can mix text, image, and audio content in the same message:

let message = ChatMessage(
  role: .user,
  content: [
    .text("Describe what you see."),
    .image(jpegData)  // Data containing JPEG bytes
  ]
)

let audioMessage = ChatMessage(
  role: .user,
  content: [
    .text("Transcribe and summarize this clip."),
    .audio(wavData)  // Data containing WAV bytes
  ]
)

let pcmMessage = ChatMessage(
  role: .user,
  content: [
    .text("Give feedback on my pronunciation."),
    ChatMessageContent.fromFloatSamples(samples, sampleRate: 16000)
  ]
)

Add tool results back to the history

let toolMessage = ChatMessage(
  role: .tool,
  content: [
    .text("{\"temperature\":72,\"conditions\":\"sunny\"}"),
    .audio(toolAudioData) // Optional: return audio bytes from your tool
  ]
)

guard let current = conversation else { return }
let updatedHistory = current.history + [toolMessage]
conversation = current.modelRunner.createConversationFromHistory(
  history: updatedHistory
)

Next steps

Learn how to expose structured JSON outputs with the @Generatable macros.
Wire up tools and external APIs with Function Calling.
Compare on-device and cloud behaviour in Cloud AI Comparison.

You now have a project that loads an on-device model, streams responses, and is ready for advanced features like structured output and tool use. Edit this page

Get Started

iOS

Android

Model Bundling Service

Prerequisites

Install the SDK

Getting and Loading Models

Loading from GGUF manifest

Example

Stream responses

Send images and audio (optional)

Add tool results back to the history

Next steps

Get Started

iOS

Android

Model Bundling Service

​Prerequisites​

​Install the SDK​

​Getting and Loading Models​

​Loading from GGUF manifest

​Example

​Stream responses​

​Send images and audio (optional)​

​Add tool results back to the history​

​Next steps​

Prerequisites

Install the SDK

Getting and Loading Models

Loading from GGUF manifest

Example

Stream responses

Send images and audio (optional)

Add tool results back to the history

Next steps