Documentation Index
Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt
Use this file to discover all available pages before exploring further.
Clone the repository
Deployment
Launch command:Production deployment
- Since vLLM takes over 2 min to cold start, if you run the inference server for production, it is recommended to keep a minimum number of warm instances with
min_containers = 1andbuffer_containers = 1. Thebuffer_containersconfig is necessary because all Modal GPUs are subject to preemption. See docs for details about cold start performance tuning. - Warm up the vLLM server after deployment by sending a single request. The warm-up process is included in the deploy-vllm.py script already.
Test commands
Test the deployed server with the followingcurl commands (replace <modal-deployment-url> with your actual deployment URL):