LFM2.5-Audio-1.5B-JP - Liquid Docs

← Back to Audio Models LFM2.5-Audio-1.5B-JP is the Japanese-focused variant of LFM2.5-Audio-1.5B. It uses the same audio/text architecture and runtime path as the English model, with Japanese-focused ASR, TTS, and interleaved voice chat behavior.

HF GGUF

Specifications

Property	Value
Parameters	1.5B (1.2B LM + 115M audio encoder)
Context Length	32K tokens
Audio Output	24kHz
Supported Language	Japanese

Japanese TTS

Natural Japanese speech synthesis

Japanese ASR

Japanese speech recognition

Voice Chat

Interleaved Japanese audio/text

Quick Start

liquid-audio
llama.cpp

Install:

pip install liquid-audio
pip install "liquid-audio[demo]"  # optional, for demo dependencies
pip install flash-attn --no-build-isolation  # optional, for flash attention 2

Multi-Turn Chat:

import torch
import soundfile as sf
from liquid_audio import LFM2AudioModel, LFM2AudioProcessor, ChatState, LFMModality

HF_REPO = "LiquidAI/LFM2.5-Audio-1.5B-JP"
processor = LFM2AudioProcessor.from_pretrained(HF_REPO).eval()
model = LFM2AudioModel.from_pretrained(HF_REPO).eval()

chat = ChatState(processor)
chat.new_turn("system")
chat.add_text("Respond with interleaved text and audio.")
chat.end_turn()

chat.new_turn("user")
wav, sampling_rate = sf.read("question_jp.wav", dtype="float32")
wav = torch.from_numpy(wav).unsqueeze(0)
chat.add_audio(wav, sampling_rate)
chat.end_turn()

chat.new_turn("assistant")

text_out, audio_out, modality_out = [], [], []
for t in model.generate_interleaved(**chat, max_new_tokens=512, audio_temperature=1.0, audio_top_k=4):
    if t.numel() == 1:
        print(processor.text.decode(t), end="", flush=True)
        text_out.append(t)
        modality_out.append(LFMModality.TEXT)
    else:
        audio_out.append(t)
        modality_out.append(LFMModality.AUDIO_OUT)

audio_codes = torch.stack(audio_out[:-1], 1).unsqueeze(0)
waveform = processor.decode(audio_codes)
sf.write("answer_jp.wav", waveform.cpu()[0], 24_000)

Japanese ASR:

chat = ChatState(processor)
chat.new_turn("system")
chat.add_text("Perform ASR in japanese.")
chat.end_turn()

Japanese TTS:

chat = ChatState(processor)
chat.new_turn("system")
chat.add_text("Perform TTS in japanese.")
chat.end_turn()

Setup:

export CKPT=/path/to/LFM2.5-Audio-1.5B-JP-GGUF
export INPUT_WAV=/path/to/input.wav
export OUTPUT_WAV=/path/to/output.wav

ASR (Audio to Text):

./llama-liquid-audio-cli -m $CKPT/LFM2.5-Audio-1.5B-JP-Q4_0.gguf \
  -mm $CKPT/mmproj-LFM2.5-Audio-1.5B-JP-Q4_0.gguf \
  -mv $CKPT/vocoder-LFM2.5-Audio-1.5B-JP-Q4_0.gguf \
  --tts-speaker-file $CKPT/tokenizer-LFM2.5-Audio-1.5B-JP-Q4_0.gguf \
  -sys "Perform ASR in japanese." --audio $INPUT_WAV

TTS (Text to Audio):

./llama-liquid-audio-cli -m $CKPT/LFM2.5-Audio-1.5B-JP-Q4_0.gguf \
  -mm $CKPT/mmproj-LFM2.5-Audio-1.5B-JP-Q4_0.gguf \
  -mv $CKPT/vocoder-LFM2.5-Audio-1.5B-JP-Q4_0.gguf \
  --tts-speaker-file $CKPT/tokenizer-LFM2.5-Audio-1.5B-JP-Q4_0.gguf \
  -sys "Perform TTS in japanese." -p "こんにちは。今日はどのようなご用件でしょうか。" --output $OUTPUT_WAV

Interleaved Mode:

./llama-liquid-audio-cli -m $CKPT/LFM2.5-Audio-1.5B-JP-Q4_0.gguf \
  -mm $CKPT/mmproj-LFM2.5-Audio-1.5B-JP-Q4_0.gguf \
  -mv $CKPT/vocoder-LFM2.5-Audio-1.5B-JP-Q4_0.gguf \
  --tts-speaker-file $CKPT/tokenizer-LFM2.5-Audio-1.5B-JP-Q4_0.gguf \
  -sys "Respond with interleaved text and audio." \
  --audio $INPUT_WAV --output $OUTPUT_WAV

Use the same llama.cpp audio runners as LFM2.5-Audio-1.5B.

​Specifications

Japanese TTS

Japanese ASR

Voice Chat

​Quick Start

Specifications

Quick Start