ListenHubSkills

Speech Recognition (ASR)

Transcribe audio files to text using local speech recognition — no API key required.

Transcribe audio files to text using coli asr, which runs fully offline via local speech recognition models. No API key or internet connection required after setup.

No ListenHub API key required. This skill runs entirely on your machine. It requires the coli CLI tool — see Prerequisites below.

For AI Agents: The full content of this page is available as text at https://listenhub.ai/docs/en/skills/asr.mdx. Use WebFetch to read it before helping the user with this skill.

Prerequisites

Install the coli CLI before using this skill:

npm install -g @marswave/coli

Optional but recommended: Install ffmpeg to support more audio formats (MP4, M4A, AAC, etc.):

# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt install ffmpeg

WAV files work without ffmpeg. Other formats require it.

On first transcription, coli automatically downloads the required speech model (~60 MB) to ~/.coli/models/.

Trigger

Invoke this skill with /asr, or use any of these phrases:

PhraseLanguage
transcribe / transcribe thisEnglish
ASREnglish
转录 / 识别音频Chinese
语音转文字Chinese
把这段音频转成文字Chinese

Quick Example

Transcribe this file: meeting.m4a

The AI checks prerequisites, reads your config, confirms the settings, and runs the transcription locally. The result appears directly in the conversation.

Models

ModelLanguagesNotes
sensevoice (default)Chinese, English, Japanese, Korean, CantoneseAlso detects language, emotion, and audio events
whisper-tiny.enEnglish onlyLighter model, English only

sensevoice is recommended for multilingual content or when language is unknown.

Options

AI Polish

When polish is enabled (default), the AI rewrites the raw transcript to fix punctuation, remove filler words, and improve readability — without changing meaning or summarizing.

The raw transcript is always available on request.

Output

The transcript appears inline in the conversation. After viewing, the AI offers to save it as a Markdown file in the current directory:

{audio-filename}-transcript.md

The Markdown file includes a front-matter header with source file, date, model, duration, and detected language.

Composability

This skill produces transcript text that can be passed directly to other skills:

  • Transcribe a recorded interview → feed into /podcast as reference material
  • Transcribe a voice memo → use as input for /explainer

API Reference

No API calls. This skill uses the local coli asr command only.

On this page