# SenseVoice.cpp **Repository Path**: RapidAI/SenseVoice.cpp ## Basic Information - **Project Name**: SenseVoice.cpp - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-08-01 - **Last Updated**: 2025-08-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # SenseVoice.cpp 「简体中文」|「[English](./README-EN.md)」 [SenseVoice](https://github.com/FunAudioLLM/SenseVoice)是具有音频理解能力的音频基础模型, 包括语音识别(ASR)、语种识别(LID)、语音情感识别(SER)和声学事件分类(AEC)或声学事件检测(AED)。 当前SenseVoice-small支持中、粤、英、日、韩语的多语言语音识别,情感识别和事件检测能力,具有极低的推理延迟。 本项目基于[ggml](https://github.com/ggerganov/ggml)推理框架。 ## 1. 特性 1. 基于ggml,不依赖其他第三方库, 致力于端侧部署 2. 特征提取参考[kaldi-native-fbank](https://github.com/csukuangfj/kaldi-native-fbank)库,支持多线程特征提取。 3. 支持flash attention解码 4. 支持Q3, Q4, Q5, Q6, Q8量化 ### 1.1 backend支持 | 后端 | 平台 | 是否支持 | |--------------------------------------|----------------------|------| | CPU | All | ✅ | | [Metal](./docs/build.md#metal-build) | Apple Silicon | ✅ | | [BLAS](./docs/build.md#blas-build) | All | ✅ | | [CUDA](./docs/build.md#cuda) | Nvidia GPU | ✅ | | [Vulkan](./docs/build.md#vulkan) | GPU | ✅ | | [Cann](./docs/build.md#cann) | Ascend NPU | 未测试 | | [BLIS](./docs/backend/BLIS.md) | All | | | [SYCL](./docs/backend/SYCL.md) | Intel and Nvidia GPU | | | [MUSA](./docs/build.md#musa) | Moore Threads GPU | | | [hipBLAS](./docs/build.md#hipblas) | AMD GPU | | ## 2. 使用 ### 直接下载模型或转换模型 可以直接从下面链接下载模型 [huggingface](https://huggingface.co/lovemefan/sense-voice-gguf) [modelscope](https://www.modelscope.cn/models/lovemefan/SenseVoiceGGUF) ```bash git lfs install git clone https://huggingface.co/lovemefan/sense-voice-gguf.git # 或从modelscope下载 git clone https://www.modelscope.cn/models/lovemefan/SenseVoiceGGUF.git ``` 或许自行下载官方模型转换 ```bash # 下载官方模型 git lfs install git clone https://www.modelscope.cn/iic/SenseVoiceSmall.git # 转换模型 python scripts/convert-pt-to-gguf.py \ --model SenseVoiceSmall \ --output /path/to/export/gguf-fp32-sense-voice-small.bin \ --out_type f32 ``` ### 非流式语音识别 silero-vad + sense voice ```bash git clone https://github.com/lovemefan/SenseVoice.cpp cd SenseVoice.cpp git submodule sync && git submodule update --init --recursive mkdir build && cd build cmake -DCMAKE_BUILD_TYPE=Release .. && make -j 8 # -t means thread num, -t 指定线程数 ./bin/sense-voice-main -m /path/gguf-fp16-sense-voice-small.bin /path/asr_example_zh.wav -t 4 -ng ``` ### 输出 当前使用sense-voice-f16模型输出 ``` $./bin/sense-voice-main -m /data/code/SenseVoice.cpp/scripts/resources/gguf-fp16-sense-voice.bin /data/code/SenseVoice.cpp/scripts/resources/SenseVoiceSmall/example/asr_example_zh.wav -t 4 sense_voice_small_init_from_file_with_params_no_state: loading model from '/data/code/SenseVoice.cpp/scripts/resources/gguf-fp16-sense-voice-small.bin' sense_voice_model_load: version: 3 sense_voice_model_load: alignment: 32 sense_voice_model_load: data offset: 444480 sense_voice_model_load: loading model sense_voice_model_load: n_vocab = 25055 sense_voice_model_load: n_encoder_hidden_state = 512 sense_voice_model_load: n_encoder_linear_units = 2048 sense_voice_model_load: n_encoder_attention_heads = 4 sense_voice_model_load: n_encoder_layers = 50 sense_voice_model_load: n_mels = 80 sense_voice_model_load: ftype = 1 sense_voice_model_load: vocab[25055] loaded sense_voice_model_load: CPU total size = 468.98 MB sense_voice_model_load: n_tensors: 1197 sense_voice_model_load: load SenseVoiceSmall takes 0.213000 second sense_voice_init_state: compute buffer (encoder) = 50.40 MB sense_voice_init_state: compute buffer (decoder) = 13.72 MB system_info: n_threads = 4 / 256 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 main: processing audio (88747 samples, 5.54669 sec) , 4 threads, 1 processors, lang = auto... sense_voice_pcm_to_feature_with_state: calculate fbank and cmvn takes 7.207 ms <|zh|><|NEUTRAL|><|Speech|><|withitn|>欢迎大家来体验达摩院推出的语音识别模型。 sense_voice_full_with_state: decoder audio use 1.011289 s, rtf is 0.182323. ``` ### 流式语音识别识别 ```bash sudo apt install libsdl2-dev ./bin/sense-voice-stream -m /path/gguf-fp16-sense-voice-small.bin ``` https://github.com/lovemefan/SenseVoice.cpp/releases/download/v1.4.0/sense-voice-straming.mp4 ## 感谢以下项目 1. 本项目借用并模仿来自[whisper.cpp](https://github.com/ggerganov/ggml/blob/master/examples/whisper/whisper.cpp) 的大部分c++代码 2. 参考来自funasr的paraformer模型结构以及前向计算 [FunASR](https://github.com/alibaba-damo-academy/FunASR) 3. 本项目参考并借用 [kaldi-native-fbank](https://github.com/csukuangfj/kaldi-native-fbank)中的fbank特征提取算法。 [FunASR](https://github.com/alibaba-damo-academy/FunASR/blob/main/runtime/onnxruntime/src/paraformer.cpp#L337C22-L372) 中的lrf + cmvn 算法 4. 借用了大量的前期工作[paraformer.cpp](https://github.com/lovemefan/paraformer.cpp), paraformer.cpp项目后续将继续更新