> ## Documentation Index > Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt > Use this file to discover all available pages before exploring further. # FAQs > Frequently asked questions about LFM models and deployment. ## General Questions LFM (Liquid Foundation Models) are a family of efficient language models built on a new hybrid architecture designed for fast training and inference. They range from 350M to 8B parameters and support text, vision, and audio modalities. Most LFM models support a 32K token context length for extended conversations and document processing. The [LFM2.5-8B-A1B](/lfm/models/lfm25-8b-a1b) supports a 128K token context length. LFM models are compatible with: * [Transformers](/deployment/gpu-inference/transformers) - For research and development * [llama.cpp](/deployment/on-device/llama-cpp) - For efficient CPU inference * [vLLM](/deployment/gpu-inference/vllm) - For high-throughput production serving * [MLX](/deployment/on-device/mlx) - For Apple Silicon optimization * [Ollama](/deployment/on-device/ollama) - For easy local deployment * [LEAP](/deployment/on-device/sdk/quick-start) - For edge and mobile deployment ## Model Selection * **General chat/instruction following**: LFM2.5-1.2B-Instruct (recommended) * **Vision tasks**: LFM2.5-VL-1.6B * **Audio/speech**: LFM2.5-Audio-1.5B * **Extraction tasks**: LFM2-1.2B-Extract or LFM2-350M-Extract * **Edge deployment**: LFM2-350M or LFM2-700M for smallest footprint * **Highest performance**: LFM2.5-8B-A1B (MoE architecture, 128K context) LFM2.5 models are updated versions with improved training that deliver higher performance while maintaining the same architecture. We recommend using LFM2.5 variants when available. [Liquid Nanos](/lfm/models/liquid-nanos) are task-specific models fine-tuned for specialized use cases like: * Information extraction (LFM2-Extract) * Translation (LFM2-350M-ENJP-MT) * RAG question answering (LFM2-1.2B-RAG) * Meeting summarization (LFM2-2.6B-Transcript) ## Deployment Yes! Use the [LEAP SDK](/deployment/on-device/sdk/quick-start) to deploy models on iOS and Android devices. LEAP provides optimized inference for edge deployment with support for quantized models. * **GGUF**: For llama.cpp, LM Studio, Ollama (Q4\_0, Q4\_K\_M, Q5\_K\_M, Q6\_K, Q8\_0, F16) * **MLX**: For Apple Silicon (4-bit, 5-bit, 6-bit, 8-bit, bf16) * **ONNX**: For cross-platform deployment with ONNX Runtime * **Q4\_0 / 4-bit**: Smallest size, fastest inference, some quality loss * **Q8\_0 / 8-bit**: Good balance of size and quality * **F16 / bf16**: Full precision, best quality, largest size For most use cases, Q4\_K\_M or Q5\_K\_M provide good quality with significant size reduction. ## Fine-tuning Yes! Most LFM models support fine-tuning with [TRL](/lfm/fine-tuning/trl) and [Unsloth](/lfm/fine-tuning/unsloth). Check the [Model Library](/lfm/models/complete-library) for trainability information. * **LoRA/QLoRA**: Memory-efficient fine-tuning * **Full fine-tuning**: For maximum customization * **SFT (Supervised Fine-Tuning)**: For instruction tuning ## Still Have Questions? * Join our [Discord community](https://discord.gg/DFU3WQeaYD) for real-time help * Check the [Cookbook](https://github.com/Liquid4All/cookbook) for examples * See [Troubleshooting](/lfm/help/troubleshooting) for common issues