sherpa_onnx 1.10.24  sherpa_onnx: ^1.10.24 copied to clipboard
sherpa_onnx: ^1.10.24 copied to clipboard
Speech recognition, speech synthesis, and speaker recognition using next-gen Kaldi with onnxruntime without Internet connection.
Supported functions #
| Speech recognition | Speech synthesis | Speaker verification | Speaker identification | 
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ | 
| Spoken Language identification | Audio tagging | Voice activity detection | 
|---|---|---|
| ✔️ | ✔️ | ✔️ | 
| Keyword spotting | Add punctuation | 
|---|---|
| ✔️ | ✔️ | 
Supported platforms #
| Architecture | Android | iOS | Windows | macOS | linux | 
|---|---|---|---|---|---|
| x64 | ✔️ | ✔️ | ✔️ | ✔️ | |
| x86 | ✔️ | ✔️ | |||
| arm64 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | 
| arm32 | ✔️ | ✔️ | |||
| riscv64 | ✔️ | 
Supported programming languages #
| 1. C++ | 2. C | 3. Python | 4. JavaScript | 
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ | 
| 5. Java | 6. C# | 7. Kotlin | 8. Swift | 
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ | 
| 9. Go | 10. Dart | 11. Rust | 12. Pascal | 
|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ | 
For Rust support, please see sherpa-rs
It also supports WebAssembly.
Introduction #
This repository supports running the following functions locally
- Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
- Text-to-speech (i.e., TTS)
- Speaker identification
- Speaker verification
- Spoken language identification
- Audio tagging
- VAD (e.g., silero-vad)
- Keyword spotting
on the following platforms and operating systems:
- x86, x86_64, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64)
- Linux, macOS, Windows, openKylin
- Android, WearOS
- iOS
- NodeJS
- WebAssembly
- Raspberry Pi
- RV1126
- LicheePi4A
- VisionFive 2
- 旭日X3派
- 爱芯派
- etc
with the following APIs
- C++, C, Python, Go, C#
- Java, Kotlin, JavaScript
- Swift, Rust
- Dart, Object Pascal
Links for Huggingface Spaces #
You can visit the following Huggingface spaces to try sherpa-onnx without
installing anything. All you need is a browser.
| Description | URL | 
|---|---|
| Speech recognition | Click me | 
| Speech recognition with Whisper | Click me | 
| Speech synthesis | Click me | 
| Generate subtitles | Click me | 
| Audio tagging | Click me | 
| Spoken language identification with Whisper | Click me | 
We also have spaces built using WebAssembly. The are listed below:
| Description | Huggingface space | ModelScope space | 
|---|---|---|
| Voice activity detection with silero-vad | Click me | 地址 | 
| Real-time speech recognition (Chinese + English) with Zipformer | Click me | 地址 | 
| Real-time speech recognition (Chinese + English) with Paraformer | Click me | 地址 | 
| Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-large | Click me | 地址 | 
| Real-time speech recognition (English) | Click me | 地址 | 
| VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoice | Click me | 地址 | 
| VAD + speech recognition (English) with Whisper tiny.en | Click me | 地址 | 
| VAD + speech recognition (English) with Zipformer trained with GigaSpeech | Click me | 地址 | 
| VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeech | Click me | 地址 | 
| VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeech | Click me | 地址 | 
| VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2 | Click me | 地址 | 
| VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC model | Click me | 地址 | 
| VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large | Click me | 地址 | 
| VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small | Click me | 地址 | 
| Speech synthesis (English) | Click me | 地址 | 
| Speech synthesis (German) | Click me | 地址 | 
Links for pre-built Android APKs #
| Description | URL | 中国用户 | 
|---|---|---|
| Streaming speech recognition | Address | 点此 | 
| Text-to-speech | Address | 点此 | 
| Voice activity detection (VAD) | Address | 点此 | 
| VAD + non-streaming speech recognition | Address | 点此 | 
| Two-pass speech recognition | Address | 点此 | 
| Audio tagging | Address | 点此 | 
| Audio tagging (WearOS) | Address | 点此 | 
| Speaker identification | Address | 点此 | 
| Spoken language identification | Address | 点此 | 
| Keyword spotting | Address | 点此 | 
Links for pre-built Flutter APPs #
Real-time speech recognition
| Description | URL | 中国用户 | 
|---|---|---|
| Streaming speech recognition | Address | 点此 | 
Text-to-speech
| Description | URL | 中国用户 | 
|---|---|---|
| Android (arm64-v8a, armeabi-v7a, x86_64) | Address | 点此 | 
| Linux (x64) | Address | 点此 | 
| macOS (x64) | Address | 点此 | 
| macOS (arm64) | Address | 点此 | 
| Windows (x64) | Address | 点此 | 
Note: You need to build from source for iOS.
Links for pre-built Lazarus APPs #
Generating subtitles
| Description | URL | 中国用户 | 
|---|---|---|
| Generate subtitles (生成字幕) | Address | 点此 | 
Links for pre-trained models #
| Description | URL | 
|---|---|
| Speech recognition (speech to text, ASR) | Address | 
| Text-to-speech (TTS) | Address | 
| VAD | Address | 
| Keyword spotting | Address | 
| Audio tagging | Address | 
| Speaker identification (Speaker ID) | Address | 
| Spoken language identification (Language ID) | See multi-lingual Whisper ASR models from Speech recognition | 
| Punctuation | Address | 
Useful links #
- Documentation: https://k2-fsa.github.io/sherpa/onnx/
- Bilibili 演示视频: https://search.bilibili.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi
How to reach us #
Please see https://k2-fsa.github.io/sherpa/social-groups.html for 新一代 Kaldi 微信交流群 and QQ 交流群.