> ## Documentation Index > Fetch the complete documentation index at: https://docs.liquid.ai/llms.txt > Use this file to discover all available pages before exploring further. # Troubleshooting > Common issues and solutions when working with LFM models. ## Installation Issues Ensure you have the latest version of transformers installed: ```bash theme={"theme":{"light":"github-light","dark":"github-dark"}} pip install transformers>=4.55.0 ``` If you're using an older version, the LFM model classes may not be available. Try these solutions in order: 1. **Use a smaller model**: Try LFM2-350M instead of LFM2-1.2B 2. **Enable quantization**: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} model = AutoModelForCausalLM.from_pretrained( model_id, load_in_4bit=True, device_map="auto" ) ``` 3. **Reduce batch size or sequence length** 4. **Use gradient checkpointing for training**: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} model.gradient_checkpointing_enable() ``` * Check your internet connection * Try using `huggingface-cli login` if the model requires authentication * Set a longer timeout: `HF_HUB_DOWNLOAD_TIMEOUT=600` * Try downloading with `snapshot_download`: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} from huggingface_hub import snapshot_download snapshot_download("LiquidAI/LFM2.5-1.2B-Instruct") ``` ## Inference Issues Adjust generation parameters: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} outputs = model.generate( **inputs, max_new_tokens=512, temperature=0.7, top_p=0.9, do_sample=True, repetition_penalty=1.1 ) ``` Key parameters to tune: * `temperature`: Lower (0.3-0.5) for factual, higher (0.7-1.0) for creative * `top_p`: 0.9 is a good default * `repetition_penalty`: 1.1-1.2 helps avoid loops Optimization strategies: 1. **Use GGUF models with llama.cpp** for CPU inference 2. **Use MLX models on Apple Silicon** 3. **Enable Flash Attention** (if available): ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} model = AutoModelForCausalLM.from_pretrained( model_id, attn_implementation="flash_attention_2" ) ``` 4. **Use vLLM for high-throughput serving** 5. **Use smaller quantization levels** (Q4 vs Q8) Increase `max_new_tokens`: ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} outputs = model.generate( **inputs, max_new_tokens=1024, # Increase this value ) ``` Check that your input isn't too long - LFM models support 32k context, but very long inputs leave less room for output. ## Fine-tuning Issues Common causes and solutions: 1. **Learning rate too high/low**: Try 2e-4 for LoRA, 2e-5 for full fine-tuning 2. **Dataset format issues**: Verify your data matches the expected chat template 3. **Insufficient data**: Ensure you have enough training examples 4. **Check for data leakage**: Make sure eval data isn't in training set Memory optimization strategies: 1. **Use QLoRA** instead of full fine-tuning 2. **Reduce batch size** and increase gradient accumulation 3. **Enable gradient checkpointing** 4. **Use a smaller model** (LFM2-350M for experiments) ```python theme={"theme":{"light":"github-light","dark":"github-dark"}} training_args = TrainingArguments( per_device_train_batch_size=1, gradient_accumulation_steps=8, gradient_checkpointing=True, ) ``` ## llama.cpp / GGUF Issues * Ensure you're using a compatible llama.cpp version * Check that the GGUF file downloaded completely * Try a different quantization level (e.g., Q4\_K\_M) * Ensure you compiled with GPU support if available * Use appropriate thread count: `-t $(nproc)` * Try a more aggressive quantization (Q4\_0) ## Still Stuck? * **Discord**: Join our [Discord community](https://discord.gg/DFU3WQeaYD) for real-time help * **GitHub Issues**: Report bugs at [github.com/Liquid4All/docs/issues](https://github.com/Liquid4All/docs/issues) * **Cookbook**: Check [examples](https://github.com/Liquid4All/cookbook) for working code