LFM2.5-Audio browser demo with WebGPU

View Source Code

This example showcases the LFM2.5-Audio-1.5B model running entirely within a web browser using WebGPU and ONNX Runtime Web technology. The demo provides three powerful audio processing modes: automatic speech recognition, text-to-speech synthesis, and interleaved conversations.

What’s Inside?

The demo provides three primary capabilities powered by LFM2.5-Audio-1.5B:

ASR (Automatic Speech Recognition): Convert spoken audio into accurate text transcriptions
TTS (Text-to-Speech): Transform written text into natural-sounding audio output
Interleaved Mode: Enable mixed conversations combining both audio and text inputs

All processing happens locally in your browser - no data is sent to external servers.

Quick Start

Clone the repository

git clone https://github.com/Liquid4All/cookbook.git
cd cookbook/examples/audio-webgpu-demo

Verify you have npm installed on your system
```
npm --version
```
Install dependencies
```
npm install
```
Start the development server
```
npm run dev
```
Access the application at http://localhost:5173 in your browser

Understanding the Architecture

This demo uses the LFM2.5-Audio-1.5B model, a 1.5 billion parameter audio model that handles both speech recognition and speech synthesis. The model has been quantized and converted to ONNX format for efficient browser-based inference.

Model Architecture

The implementation uses quantized ONNX models sourced from the LiquidAI/LFM2.5-Audio-1.5B-ONNX repository on Hugging Face. These models are optimized to run with WebGPU acceleration, providing fast inference directly in the browser.

Three Operation Modes

1. Automatic Speech Recognition (ASR)

Input: Audio file or microphone recording
Output: Text transcription
Use case: Transcribe meetings, lectures, or voice notes

2. Text-to-Speech (TTS)

Input: Written text
Output: Natural-sounding audio
Use case: Create voice assistants, audiobooks, or accessibility features

3. Interleaved Mode

Input: Mixed audio and text
Output: Conversational responses in text or audio
Use case: Interactive voice assistants and chatbots

System Requirements

WebGPU Support RequiredThis demo requires a modern web browser with WebGPU support:

Chrome 113 or later (recommended)
Edge 113 or later

If WebGPU is not enabled by default, you may need to manually activate it via browser flags:

Chrome: chrome://flags/#enable-unsafe-webgpu
Edge: edge://flags/#enable-unsafe-webgpu

Model Licensing

LFM 1.0 LicenseThe model weights are distributed under the LFM 1.0 License. For complete licensing details, refer to the official Hugging Face repository.

Build for Production

To create an optimized production build:

npm run build

The build output will be in the dist/ directory, ready for deployment to any web server.

Further Improvements

Potential enhancements for this demo:

Streaming Inference: Real-time processing for longer audio inputs
Voice Customization: Add controls for pitch, speed, and voice characteristics in TTS mode
Noise Reduction: Integrate preprocessing to improve ASR accuracy in noisy environments
Batch Processing: Support for processing multiple audio files simultaneously
Model Caching: Optimize initial load time with better caching strategies

Need help?

Join our Discord

Connect with the community and ask questions about this example.

Get Started

Laptop Examples

Android Examples

Web Examples

Model Customization

LFM2.5-Audio browser demo with WebGPU

View Source Code

What’s Inside?

Quick Start

Understanding the Architecture

Model Architecture

Three Operation Modes

System Requirements

Model Licensing

Build for Production

Further Improvements

Need help?

Join our Discord

Get Started

Laptop Examples

Android Examples

Web Examples

Model Customization

View Source Code

​What’s Inside?

​Quick Start

​Understanding the Architecture

​Model Architecture

​Three Operation Modes

​System Requirements

​Model Licensing

​Build for Production

​Further Improvements

​Need help?

Join our Discord

What’s Inside?

Quick Start

Understanding the Architecture

Model Architecture

Three Operation Modes

System Requirements

Model Licensing

Build for Production

Further Improvements

Need help?