vectominist

Follow

🎯

Focusing

Heng-Jui Chang vectominist

🎯

Focusing

Follow

PhD Candidate @ MIT CSAIL. Speech Processing and Balloon Arts.

89 followers · 17 following

Massachusetts Institute of Technology
Cambridge, MA
02:44 (UTC -04:00)
people.csail.mit.edu/hengjui
@hjchang87

Achievements

Achievements

Highlights

Pro

Organizations

Stars

k2-fsa / OmniVoice

High-Quality Voice Cloning TTS for 600+ Languages

Python 4,381 685 Updated Apr 22, 2026

OpenMOSS / MOSS-TTS

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenario…

Python 1,683 158 Updated Apr 13, 2026

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 43,471 4,915 Updated Apr 24, 2026

kan-bayashi / LibriTTSLabel

Alignment files of LibriTTS.

68 7 Updated Mar 16, 2020

CorentinJ / librispeech-alignments

Word alignments generated by the Montreal Forced Aligner for the Librispeech dataset

Python 180 24 Updated Mar 25, 2019

pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 9,839 1,050 Updated Apr 16, 2026

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 8,912 766 Updated Mar 26, 2026

duoan / TorchCode

🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.

Jupyter Notebook 3,580 301 Updated Mar 27, 2026

Tencent / StableToken

[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.

Python 27 1 Updated Feb 27, 2026

facebookresearch / dacvae

DACVAE

Python 217 18 Updated Dec 22, 2025

Labbeti / aac-metrics

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

Python 71 8 Updated Mar 22, 2026

Audio-WestlakeU / ATST-SED

This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".

Jupyter Notebook 166 16 Updated Apr 22, 2026

jimbozhang / xares

A benchmark for evaluating audio encoders on various audio tasks.

Python 51 9 Updated Apr 27, 2026

Red-Killer / shit

3,990 263 Updated Feb 15, 2026

a43992899 / MARBLE

State-of-the-art pretrained music models for training, evaluation, inference

Python 174 18 Updated Jan 20, 2026

kuleshov-group / bd3lms

[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Python 996 75 Updated Jul 10, 2025

facebookresearch / spiritlm

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 930 64 Updated Oct 28, 2024

mitmath / matrixcalc

MIT IAP short course: Matrix Calculus for Machine Learning and Beyond

Jupyter Notebook 582 85 Updated Jan 31, 2026

Alexander-H-Liu / dinosr

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

Python 53 5 Updated Jan 18, 2024

state-spaces / mamba

Mamba SSM architecture

Python 18,107 1,712 Updated Apr 27, 2026

facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Jupyter Notebook 11,770 1,174 Updated Apr 8, 2026

facebookresearch / fairseq2

FAIR Sequence Modeling Toolkit 2

Python 1,128 140 Updated Apr 27, 2026

rtqichen / torchdiffeq

Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

Python 6,410 996 Updated Apr 4, 2025

ChenyangLEI / All-In-One-Deflicker

[CVPR2023] Blind Video Deflickering by Neural Filtering with a Flawed Atlas

Python 758 45 Updated May 21, 2025

nextai-translator / bob-plugin-openai-translator

基于 LLM 的文本翻译、文本润色、语法纠错 Bob 插件，让我们一起迎接不需要巴别塔的新时代！Licensed under CC BY-NC-SA 4.0

TypeScript 5,649 260 Updated Apr 23, 2026

facebookresearch / muavic

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Python 399 36 Updated Sep 11, 2023

lucidrains / autoregressive-linear-attention-cuda

CUDA implementation of autoregressive linear attention, with all the latest research findings

Python 46 3 Updated May 23, 2023

iver56 / torch-audiomentations

Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for deep learning.

Python 1,145 101 Updated Nov 24, 2025

iver56 / audiomentations

A Python library for audio data augmentation. Useful for making audio ML models work well in the real world, not just in the lab.

Python 2,264 219 Updated Apr 13, 2026

garrettj403 / SciencePlots

Matplotlib styles for scientific plotting

Python 8,784 804 Updated Feb 25, 2026