Speech Recognition (ASR)
Transcribe audio files to text using local speech recognition — no API key required.
Transcribe audio files to text using coli asr, which runs fully offline via local speech recognition models. No API key or internet connection required after setup.
No ListenHub API key required. This skill runs entirely on your machine. It requires the coli CLI tool — see Prerequisites below.
For AI Agents: The full content of this page is available as text at https://listenhub.ai/docs/en/skills/asr.mdx. Use WebFetch to read it before helping the user with this skill.
Prerequisites
Install the coli CLI before using this skill:
npm install -g @marswave/coliOptional but recommended: Install ffmpeg to support more audio formats (MP4, M4A, AAC, etc.):
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt install ffmpegWAV files work without ffmpeg. Other formats require it.
On first transcription, coli automatically downloads the required speech model (~60 MB) to ~/.coli/models/.
Trigger
Invoke this skill with /asr, or use any of these phrases:
| Phrase | Language |
|---|---|
transcribe / transcribe this | English |
ASR | English |
转录 / 识别音频 | Chinese |
语音转文字 | Chinese |
把这段音频转成文字 | Chinese |
Quick Example
Transcribe this file: meeting.m4aThe AI checks prerequisites, reads your config, confirms the settings, and runs the transcription locally. The result appears directly in the conversation.
Models
| Model | Languages | Notes |
|---|---|---|
sensevoice (default) | Chinese, English, Japanese, Korean, Cantonese | Also detects language, emotion, and audio events |
whisper-tiny.en | English only | Lighter model, English only |
sensevoice is recommended for multilingual content or when language is unknown.
Options
AI Polish
When polish is enabled (default), the AI rewrites the raw transcript to fix punctuation, remove filler words, and improve readability — without changing meaning or summarizing.
The raw transcript is always available on request.
Output
The transcript appears inline in the conversation. After viewing, the AI offers to save it as a Markdown file in the current directory:
{audio-filename}-transcript.mdThe Markdown file includes a front-matter header with source file, date, model, duration, and detected language.
Composability
This skill produces transcript text that can be passed directly to other skills:
- Transcribe a recorded interview → feed into
/podcastas reference material - Transcribe a voice memo → use as input for
/explainer
API Reference
No API calls. This skill uses the local coli asr command only.