-
Massachusetts Institute of Technology
- Cambridge, MA
-
02:44
(UTC -04:00) - people.csail.mit.edu/hengjui
- @hjchang87
Highlights
- Pro
Stars
High-Quality Voice Cloning TTS for 600+ Languages
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenario…
Word alignments generated by the Montreal Forced Aligner for the Librispeech dataset
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.
[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
A benchmark for evaluating audio encoders on various audio tasks.
State-of-the-art pretrained music models for training, evaluation, inference
[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
MIT IAP short course: Matrix Calculus for Machine Learning and Beyond
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
Foundational Models for State-of-the-Art Speech and Text Translation
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.
[CVPR2023] Blind Video Deflickering by Neural Filtering with a Flawed Atlas
基于 LLM 的文本翻译、文本润色、语法纠错 Bob 插件,让我们一起迎接不需要巴别塔的新时代!Licensed under CC BY-NC-SA 4.0
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
CUDA implementation of autoregressive linear attention, with all the latest research findings
Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.
A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.
Matplotlib styles for scientific plotting