Stars
Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
MOSS-Audio-Tokenizer is a Causal Transformer-based audio tokenizer built on the CAT architecture. Trained on 3M hours of diverse audio, it supports streaming and variable bitrates, delivering SOTA …
Fast and memory-efficient exact attention
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
pyright fork with various type checking improvements, improved vscode support and pylance features built into the language server
Edit, preview and share mermaid charts/diagrams. New implementation of the live editor.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Awesome speech/audio LLMs, representation learning, and codec models
Your one-stop solution for voice dataset creation
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Text Normalization & Inverse Text Normalization
基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"
How to use our public wav2vec2 dimensional emotion model
This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey".
Added vLLM support to IndexTTS for faster inference.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.