On-Device
iOS SDK
Deploy models natively on iPhone and iPad
Android SDK
Deploy models natively on Android devices
llama.cpp
CPU-first inference with cross-platform support
MLX
Optimized inference on Apple Silicon
ONNX
Cross-platform inference with ONNX Runtime
Ollama
Easy local deployment and model management
GPU Inference
Transformers
Flexible inference with Hugging Face Transformers
vLLM
High-throughput production serving
SGLang
Structured generation and fast serving
Modal
Serverless GPU deployment
Baseten
Production model inference platform
Fal
Fast inference API platform