# ComfyUI_StepAudioTTS **Repository Path**: micdav/ComfyUI_StepAudioTTS ## Basic Information - **Project Name**: ComfyUI_StepAudioTTS - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-19 - **Last Updated**: 2025-12-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README 中文 | [English](README.md) # Step-Audio-TTS 的 ComfyUI 节点, 文本转语音, 可说话, 唱歌, RAP, 或者克隆声音. ![](https://github.com/billwuhao/ComfyUI_StepAudioTTS/blob/master/assets/2025-02-21_05-34-25.png) ## 更新 [2025-03-21]⚒️: 完全重构代码, 增加更多可调参数, 可根据文本长短调节 max_length. 可选 unload_model, 是否卸载模型, 加快推理速度. [2025-03-07]⚒️: 自定义说话者直接在 `ComfyUI\models\TTS\Step-Audio-speakers\speakers_info.json` 中定义, 无需节点中输入. 请将本仓库中 `Step-Audio-speakers` 文件夹移动到 `ComfyUI\models\TTS` 文件夹中, 结构如下: ``` ComfyUI\models\TTS ├── Step-Audio-Tokenizer ├── Step-Audio-speakers ├── Step-Audio-TTS-3B ``` 然后就可在 `ComfyUI\models\TTS\Step-Audio-speakers` 文件夹下随意自定义说话者即可使用. 注意说话者名称配置一定要一致: ![](https://github.com/billwuhao/ComfyUI_SparkTTS/blob/master/images/2025-03-07_03-30-51.png) [2025-03-06]⚒️: 新增录音节点 `MW Audio Recorder` 可用麦克风录制音频, 进度条显示录制进度: ![](https://github.com/billwuhao/ComfyUI_StepAudioTTS/blob/master/assets/2025-03-06_21-29-09.png) | 参数名/Parameter | 作用描述/Description | 范围/Range | 注意事項/Notes | |---------------------|------------------------------------------------------------------------------------|--------------------------------|------------------------------------------------------------------------------| | **trigger** | 录音触发开关 - 设为True开始录音
Recording trigger - Set to True to start recording | Boolean (True/False) | 需要从False切到True才能触发
Requires changing from False to True to activate | | **record_sec** | 主录音时长(秒)
Main recording duration (seconds) | 1-60 (整数/integer) | 实际时长
Actual duration | | **n_fft** | FFT窗口大小(影响频率分辨率)
FFT window size (affects frequency resolution) | 512,1024,...,4096 (512倍数/multiplies) | 值越大频率分辨率越高
Higher values give better frequency resolution | | **sensitivity** | 降噪灵敏度(值越高越激进)
Noise reduction sensitivity (higher=more aggressive) | 0.5-3.0 (步长0.1/step 0.1) | 1.2=标准办公室环境
1.2=standard office environment | | **smooth** | 时频平滑系数(值越高越自然)
Time-frequency smoothing (higher=more natural) | 1,3,5,7,9,11 (奇数/odd numbers) | 建议语音:5,音乐:7
Recommended: 5 for speech, 7 for music | | **sample_rate** | 采样率(影响音质与文件大小)
Sampling rate (affects quality & size) | 16000/44100/48000 Hz | 44100=CD音质
44100=CD quality | [2025-03-02]⚒️: 增加实验性的 `custom_mark`, 用 "()" 包围例如 `(温柔)(东北话)`, 它可能会有效. [2025-02-25]⚒️: 支持自定义说话者 `custom_speaker`. ## 安装 ``` cd ComfyUI/custom_nodes git clone https://github.com/billwuhao/ComfyUI_StepAudioTTS.git cd ComfyUI_StepAudioTTS pip install -r requirements.txt # python_embeded ./python_embeded/python.exe -m pip install -r requirements.txt ``` ## 模型下载 下载到 `ComfyUI\models\TTS` 文件夹中. ### Huggingface | 模型 | 链接 | |-------|-------| | Step-Audio-Tokenizer | [🤗huggingface](https://huggingface.co/stepfun-ai/Step-Audio-Tokenizer) | | Step-Audio-TTS-3B | [🤗huggingface](https://huggingface.co/stepfun-ai/Step-Audio-TTS-3B) | ### Modelscope | 模型 | 链接 | |-------|-------| | Step-Audio-Tokenizer | [modelscope](https://modelscope.cn/models/stepfun-ai/Step-Audio-Tokenizer) | | Step-Audio-TTS-3B | [modelscope](https://modelscope.cn/models/stepfun-ai/Step-Audio-TTS-3B) | ## 支持 中文, 英文, 韩语, 日语, 四川话, 粤语等 ## 致谢 本项目的部分代码来自: * [Step-Audio](https://github.com/stepfun-ai/Step-Audio) * [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) * [transformers](https://github.com/huggingface/transformers) * [FunASR](https://github.com/modelscope/FunASR) 感谢以上所有开源项目对本项目开源做出的贡献!