> ## Documentation Index
> Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Modal

> Modal is a serverless cloud platform for running AI/ML workloads with instant autoscaling on GPUs and CPUs.

<Tip>
  Use Modal for serverless cloud deployments with instant autoscaling, GPU access, and production-ready inference serving.
</Tip>

## Clone the repository

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
git clone https://github.com/Liquid4All/lfm-inference
```

## Deployment

Launch command:

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
cd modal

# deploy LFM2 8B MoE model
modal deploy deploy-vllm.py

# deploy other LFM2 model, MODEL_NAME defaults to LiquidAI/LFM2-8B-A1B
MODEL_NAME=LiquidAI/<model-slug> modal deploy deploy-vllm.py
```

See full list of open source LFM models on [Hugging Face](https://huggingface.co/collections/LiquidAI/lfm2).

## Production deployment

* Since vLLM takes over 2 min to cold start, if you run the inference server for production, it is recommended to keep a minimum number of warm instances with `min_containers = 1` and `buffer_containers = 1`. The `buffer_containers` config is necessary because all Modal GPUs are subject to [preemption](https://modal.com/docs/guide/preemption). See [docs](https://modal.com/docs/guide/cold-start#overprovision-resources-with-min_containers-and-buffer_containers) for details about cold start performance tuning.
* Warm up the vLLM server after deployment by sending a single request. The warm-up process is included in the [deploy-vllm.py](https://github.com/Liquid4All/lfm-inference/blob/main/modal/deploy-vllm.py) script already.

## Test commands

Test the deployed server with the following `curl` commands (replace `<modal-deployment-url>` with your actual deployment URL):

```bash theme={"theme":{"light":"github-light","dark":"github-dark"}}
# List deployed model
curl https://<modal-deployment-url>/v1/models

# Query the deployed LFM model
curl -X POST https://<modal-deployment-url>/v1/chat/completions \
  -d '{
    "model": "LiquidAI/LFM2-8B-A1B",
    "messages": [
      {
        "role": "user",
        "content": "What is the melting temperature of silver?"
      }
    ],
    "max_tokens": 32,
    "temperature": 0
  }'
```
