Audio-to-text transcription CLI
What's inside?
This example demonstrates how to use the LFM2-Audio-1.5B model with llama.cpp to transcribe audio files locally in real-time.
When you combine the efficiency of llama.cpp with the power of a small audio model like LFM2-Audio-1.5B, you can build real-time applications that can run on
- smartphones
- self-driving cars
- smart home devices
- you name it
without internet connection or any other cloud service dependencies.
Intelligent audio assistants on the edge are possible, and this repository is just one example of how to build one.
Quick start
-
Clone the repository
git clone https://github.com/Liquid4All/cookbook.git
cd cookbook/examples/audio-transcription-cli -
Install uv on your system, if you don't have it already.
Click to see installation instructions for uv
macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | shWindows:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" -
Download a few audio samples
uv run download_audio_samples.py -
Run the transcription CLI, and see the transcription of the audio sample in the console.
uv run transcribe --audio './audio-samples/barackobamafederalplaza.mp3' --play-audioBy passing the
--play-audioflag, you will hear the audio in the background during transcription.
Understanding the architecture
This example is a 100% local audio-to-text transcription CLI, that runs on your machine thanks to llama.cpp. Neither inputs audios nor outputs text are sent to any server. Everything runs on your machine.

llama.cpp support for audio models
llama.cpp is a super fast and lightweight open-source inference engine for Language Models. It is written in C++ and can be used to run LLMs on your local machine.
Audio support in llama.cpp is still quite experimental, and not fully integrated on the main branch of the llama.cpp project. Because of this, the Liquid AI team has released specialized llama.cpp builds that support the LFM2-Audio-1.5B model, that you will need to run this CLI.
The tool downloads the necessary llama.cpp builds for your platform automatically, so you don't need to worry about it.
Supported Platforms
The following platforms are currently supported:
- android-arm64
- macos-arm64
- ubuntu-arm64
- ubuntu-x64
If your platform is not supported, you will need to wait for the builds to be released.
Further improvements
The decoded text is not perfect, due to overlapping chunk and partial sentences that are grammatically incorrect.
To improve the transcription, we can use a text cleaning model to clean the text, in a local 2-step workflow for real-time Audio to Speech recognition.
For example, we can use
- LFM2-Audio-1.5B for audio to text extraction
- LFM2-350M for text cleaning
LFM2-350M is a small text-to-text model that can be used for tasks like text cleaning. To achieve optimal performance for your particular use case, you need to optimize your system and user prompts.
One way to do so is by using the Leap Worbench, a no-code tool that we are developing at Liquid AI for tasks like this.
If you want to get early access, join the Liquid AI Discord server and head to the #gpt5-level-slms
Need help?
Join the Liquid AI Discord Community and ask.