View Source Code
Browse the complete example on GitHub
A browser driving game you control with your hands and voice, powered by models running fully local.
Steer by holding both hands up like a steering wheel. Speak commands to accelerate, brake, toggle headlights, and play music. No cloud calls, no server round-trips. Everything runs in your browser tab.
How it works
Two models run in parallel, entirely client-side:- MediaPipe Hand Landmarker tracks your hand positions via webcam at ~30 fps. The angle between your two wrists drives the steering.
- LFM2.5-Audio-1.5B runs in a Web Worker with ONNX Runtime Web. It listens for speech via the Silero VAD and transcribes each utterance on-device. Matched keywords control game state.
Voice commands
| Say | Effect |
|---|---|
speed / fast / go | Accelerate to 120 km/h |
slow / stop / brake | Decelerate to 0 km/h |
lights on | Enable headlights |
lights off | Disable headlights |
music / play | Start the techno beat |
stop music / silence | Stop the beat |
Prerequisites
Browser Requirements
- Chrome 113+ or Edge 113+ (WebGPU required for fast audio inference; falls back to WASM)
- Webcam and microphone access
- Node.js 18+
Run locally
Architecture
requestAnimationFrame. Hand detection is throttled to ~30 fps so it does not block rendering. Voice processing happens off the main thread and delivers results via postMessage.