Conversation & Generation

All functions listed in this document are safe to call from the main or UI thread and all callbacks will be run on the main thread, unless there are explicit instructions or explanations.

Conversation

The instance of a conversation, which stores the message history and states that are needed by the model runner for generation. While this Conversation instance holds the data necessary for the model runner to perform generation, the app still needs to maintain the UI state of the message history representation.

interface Conversation {
  // Chat history
  val history: List<ChatMessage>
  // Whether a generation is in progress
  val isGenerating: Boolean

  // Generating response from a text input as user message
  fun generateResponse(userTextMessage: String, generationOptions: GenerationOptions? = null): Flow<MessageResponse>
  // Generating response from an arbitrary new message
  fun generateResponse(message: ChatMessage, generationOptions: GenerationOptions? = null): Flow<MessageResponse>
  // Register a function to the conversation for the model to invoke.
  fun registerFunction(function: LeapFunction)
  // Export the chat history in this conversation to a `JSONArray`.
  fun exportToJSONArray(): JSONArray
}

Creation

Instance of this class should not be directly initialized. It should instead be created by the ModelRunner instance.

Lifetime

While a Conversation stores the history and state that is needed by the model runner to generate content, its generation function relies on the model runner that creates it. As a result, if that model runner instance has been destroyed, the Conversation instance will fail to run subsequent generations.

`history`

history value field will return a copy of the chat message history. Any mutations to its return value will not change the internal state of the generation. If there is an ongoing generation, the partial message may not be available in the return value of this field. However, it is guaranteed that when MessageResponse.Complete is received and when the flow is completed, the history value field will be updated to have the latest message.

`isGenerating`

isGenerating value field is true if the generation is still in progress. Its value will be consistent across all threads.

`generateResponse`

generateResponse(message: ChatMessage) is the preferred method for response generation. It can be called from UI thread. The return value is a Kotlin asynchronous flow. The generation will not start until the flow is collected (following the convention of flows). Refer to Android documentation on how to properly handle the flow with lifecycle-aware components. A MessageResponse instance will be emitted from this flow, which contains the chunk of data generated from the model.

Errors will be thrown as LeapGenerationException in the stream. Use .catch to capture errors from the generation.

If there is already a running generation, further generation requests are blocked until the current generation is done. However, there is no guarantee that the order in which requests are received will be the order in which they are processed.

`registerFunction`

`exportToJSONArray`

Export the whole conversation history into a JSONArray. Each element can be interpreted as a ChatCompletionRequestMessage instance in OpenAI API schema. See also: Serialization Support.

Cancellation of the generation

Generation will be stopped when the coroutine job that runs the flow is canceled, but it may (no guarantee) keep going as long as the job of the flow is still active. We highly recommend using a ViewModel with viewModelScope to manage generation lifecycle. The generation will be automatically canceled when the ViewModel is cleared. Here is an example:

import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.Job
import kotlinx.coroutines.runBlocking

class ChatViewModel(application: Application) : AndroidViewModel(application) {
    private var conversation: Conversation? = null
    private var modelRunner: ModelRunner? = null
    private var generationJob: Job? = null

    private val _generatedText = MutableStateFlow("")
    val generatedText: StateFlow<String> = _generatedText.asStateFlow()

    fun generateResponse(userInput: String) {
        generationJob = viewModelScope.launch {
            _generatedText.value = ""
            conversation?.generateResponse(userInput)
                ?.onEach { response ->
                    when (response) {
                        is MessageResponse.Chunk -> {
                            _generatedText.value += response.text
                        }
                        is MessageResponse.Complete -> {
                            Log.d(TAG, "Generation is done!")
                        }
                        else -> {}
                    }
                }
                ?.collect()
        }
    }

    fun stopGeneration() {
        generationJob?.cancel()
        generationJob = null
    }

    override fun onCleared() {
        super.onCleared()
        // Generation is automatically canceled when ViewModel is cleared
        generationJob?.cancel()

        // Use runBlocking to ensure model is unloaded before ViewModel is destroyed
        // viewModelScope is cancelled during clearing, so we need a non-cancelled context
        runBlocking(Dispatchers.IO) {
            modelRunner?.unload()
        }
    }

    companion object {
        private const val TAG = "ChatViewModel"
    }
}

GenerationOptions

A data class to represents options in generating responses from a model.

data class GenerationOptions(
    var temperature: Float? = null,
    var topP: Float? = null,
    var minP: Float? = null,
    var repetitionPenalty: Float? = null,
    var jsonSchemaConstraint: String? = null,
    var functionCallParser: LeapFunctionCallParser? = LFMFunctionCallParser(),
) {
  fun setResponseFormatType(kClass: KClass<*>)

  companion object {
    fun build(buildAction: GenerationOptions.() -> Unit): GenerationOptions
  }
}

Fields

temperature: Sampling temperature parameter. Higher values will make the output more random, while lower values will make it more focused and deterministic.
topP: Nucleus sampling parameter.In nucleus sampling, the model only considers the results of the tokens with topP probability mass.
minP: Minimal possibility for a token to be considered in generation.
repetitionPenalty: Repetition penalty parameter. A positive value will decrease the model’s likelihood to repeat the same line verbatim.
jsonSchemaConstraint: Enable constrained generation with a JSON Schema. See constrained generation for more details.
functionCallParser: Define the parser for function calling requests from the model. See function calling guide for more details.

Methods

setResponseFormatType: Enable constrained generation with a Generatable data class. See constrained generation for more details.

Kotlin builder function GenerationOptions.build is also available. For example,

val options = GenerationOptions.build {
  setResponseFormatType(MyDataType::class)
  temperature = 0.5f
}

If a parameter is not set in this options, the default value from the model bundle will be used.

`ModelRunner`

An instance of the model loaded in memory. Conversation instances should always be created from an instance of ModelRunner. The application needs to own the model runner object – if the model runner object is destroyed, any ongoing generations may fail. If you need your model runner to survive after the destruction of activities, you may need to wrap it in an Android Service.

interface ModelRunner {
  // create a conversation instance
  fun createConversation(systemPrompt: String? = null): Conversation

  // create a conversation from chat message history
  fun createConversationFromHistory(history: List<ChatMessage>): Conversation

  // unload the model: the runner cannot be used after unload is called.
  suspend fun unload()


  // Start generation from the conversation instance.
  fun generateFromConversation(
    conversation: Conversation,
    callback: GenerationCallback,
    generationOptions: GenerationOptions? = null,
  ): GenerationHandler
}

`createConversation`

Factory method to create a conversation instance based on this model runner. As a result, the model runner instance will be used for any generation around the created conversation instance. The model runner will have access to the internal state of the created conversation. If the model runner is unloaded, any conversation instances created from the model runner will be read only.

`createConversationFromHistory`

This factory method will create a conversation object with the provided chat history. It can be used to restore a conversation from persistent storage while ensuring that a living model runner is backing it.

`unload`

Unload the model from memory. The model runner will not be able to perform generation once this method is invoked. An exception may be thrown by any ongoing generation. It is the app developer’s responsibility to ensure that unload is called after all generation is complete.

`generateFromConversation`

This function is not recommended to be called by the app directly. It is an internal interface for the model runner implementation to expose the generation ability to LEAP SDK. Conversation.generateResponse is the better wrapper of this method, which relies on Kotlin coroutines to connect with lifecycle-aware components.

This function may block the caller thread. If you must use it, please call it outside the main thread.

`MessageResponse`

The response generated from models. The generation may take a while to finish, so the generated text will be emitted as “chunks”. When the generation completes, a complete response object will be emitted. This is a sealed class where only the following options are available:

sealed interface MessageResponse {
  class Chunk(val text: String) : MessageResponse
  class ReasoningChunk(val reasoning: String) : MessageResponse
  class FunctionCalls(val functionCalls: List<LeapFunctionCall>): MessageResponse
  class AudioSample(val samples: FloatArray, val sampleRate: Int): MessageResponse
  class Complete(
    val fullMessage: ChatMessage,
    val finishReason: GenerationFinishReason,
    val stats: GenerationStats?,
  ) : MessageResponse
}

Chunk is a piece of generated text.
ReasoningChunk is a piece of generated reasoning text. It will be emitted only by reasoning models.
FunctionCalls is a group of function call requests from the model. It will only be emitted if some functions are registered to the conversation.
AudioSample is a piece of generated audio. Samples are encoded as 32-bit float, and the audio sample rate is provided as a field. During the whole generation, the sample rate will not change.
Complete indicates the completion of a generation.
The fullMessage field contains the complete ChatMessage with all the content generated from this round of generation
The finishReason indicates why the generation is done. STOP means the model decides to stop generation, while EXCEED_CONTEXT means that the generated content reaches the maximum context length.
The stats field contains statistics of the generation including promptTokens, completionTokens, totalTokens and tokenPerSecond. This field could be null.

Getting Started

On-Device

GPU Inference

Tools

Conversation & Generation

Conversation

Creation

Lifetime

`history`

`isGenerating`

`generateResponse`

`registerFunction`

`exportToJSONArray`

Cancellation of the generation

GenerationOptions

`ModelRunner`

`createConversation`

`createConversationFromHistory`

`unload`

`generateFromConversation`

`MessageResponse`

Getting Started

On-Device

GPU Inference

Tools

​Conversation

​Creation

​Lifetime

​history

​isGenerating

​generateResponse

​registerFunction

​exportToJSONArray

​Cancellation of the generation

​GenerationOptions

​ModelRunner

​createConversation

​createConversationFromHistory

​unload

​generateFromConversation

​MessageResponse

Conversation

Creation

Lifetime

`history`

`isGenerating`

`generateResponse`

`registerFunction`

`exportToJSONArray`

Cancellation of the generation

GenerationOptions

`ModelRunner`

`createConversation`

`createConversationFromHistory`

`unload`

`generateFromConversation`

`MessageResponse`