Skip to main content

Fal

Fal is a serverless generative media platform offering lightning-fast inference for AI models for image, video, and audio generation.

This guide provides scripts for deploying Liquid AI models on Fal.

Clone the repository

git clone https://github.com/Liquid4All/lfm-inference

Deployment

cd fal

# run one-off server
fal run deploy-lfm2.py::serve

# run production server
fal deploy deploy-lfm2.py::serve --app-name lfm2-8b --auth private

The first run will require extra time to download the docker image and model weights.

Test call

First, create an API key here.

Then run the following cURL commands:

export FAL_API_KEY=<your-fal-api-key>

# List deployed model
curl https://fal.run/<org-id>/<app-id>/v1/models -H "Authorization: Key $FAL_API_KEY"

# Query the deployed LFM model
curl -X POST https://fal.run/<org-id>/<app-id>/v1/chat/completions \
-H "Authorization: Key $FAL_API_KEY" \
--json '{
"model": "LiquidAI/LFM2-8B-A1B",
"messages": [
{
"role": "user",
"content": "What is the melting temperature of silver?"
}
],
"max_tokens": 32,
"temperature": 0
}'
note

Fal endpoints expect the Key prefix in the Authorization header.