Rongjiehuang

Follow

🎯

Focusing. I may be slow to reply.

Rongjiehuang

🎯

Focusing. I may be slow to reply.

Follow

Focusing on multimodal synthesis (speech/audio/sing), speech translation, and self-supervised learning.

530 followers · 65 following

Facebook AI Research (FAIR)
Menlo Park
rongjiehuang.github.io

Achievements

Achievements

Organizations

Stars

OpenGVLab / ScaleCUA

[ICLR 2026 Oral] ScaleCUA is the open-sourced computer use agents that can operate on cross-platform environments (Windows, macOS, Ubuntu, Android).

Python 1,108 78 Updated Jan 7, 2026

Gen-Verse / OpenClaw-RL

OpenClaw-RL: Train any agent simply by talking

Python 5,163 551 Updated Apr 28, 2026

Tongyi-MAI / MAI-UI

MAI-UI: Real-World Centric Foundation GUI Agents ranging from 2B to 235B

Jupyter Notebook 1,794 178 Updated Apr 20, 2026

verl-project / verl

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 20,996 3,763 Updated Apr 29, 2026

ServiceNow / GroundCUA

GroundCUA

Python 125 14 Updated Mar 24, 2026

meituan / EvoCUA

EvoCUA: Evolving Computer Use Agent

Python 314 21 Updated Mar 31, 2026

Gitlawb / openclaude

Open Claude Is Open-source coding-agent CLI for OpenAI, Gemini, DeepSeek, Ollama, Codex, GitHub Models, and 200+ models via OpenAI-compatible APIs.

TypeScript 24,881 8,099 Updated Apr 29, 2026

ultraworkers / claw-code

The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.

Rust 189,076 109,449 Updated Apr 28, 2026

meituan-longcat / LongCat-Flash-Thinking-2601

248 12 Updated Apr 24, 2026

srvk / how2-dataset

This repository contains code and metadata of How2 dataset

Python 193 20 Updated Dec 30, 2024

facebookresearch / sam-audio

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,481 310 Updated Jan 5, 2026

SandAI-org / MAGI-1

MAGI-1: Autoregressive Video Generation at Scale

Python 3,685 238 Updated Jun 17, 2025

facebookresearch / omnilingual-asr

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,783 249 Updated Dec 30, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 78,492 16,217 Updated Apr 29, 2026

hiyouga / EasyR1

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,896 371 Updated Apr 6, 2026

umnooob / signvip

Python 24 7 Updated Nov 26, 2025

meituan-longcat / LongCat-Flash-Omni

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 482 31 Updated Apr 24, 2026

facebookresearch / large_concept_model

Large Concept Models: Language modeling in a sentence representation space

Python 2,349 206 Updated Jan 29, 2025

zcgzcgzcg1 / MRC_book

《机器阅读理解：算法与实践》代码

Python 157 59 Updated Jul 25, 2024

huggingface / speech-to-speech

Build local voice agents with open-source models

Python 4,692 551 Updated Apr 28, 2026

KlingAIResearch / HumanAesExpert

Official implementation of "HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment"

Python 112 2 Updated Apr 15, 2025

XueZeyue / DanceGRPO

An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation

Python 1,589 78 Updated Oct 16, 2025

facebookresearch / cwm

Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.

Python 862 70 Updated Dec 26, 2025

facebookresearch / perception_models

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 2,259 155 Updated Apr 13, 2026

facebookresearch / lingua

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,762 271 Updated Jul 18, 2025

2U1 / Qwen-VL-Series-Finetune

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,837 211 Updated Apr 10, 2026

Fantasy-AMAP / fantasy-talking

[ACM MM 2025] FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

Python 1,626 126 Updated Jan 26, 2026

Omni-Avatar / OmniAvatar

Python 1,822 166 Updated Aug 6, 2025

huggingface / finetrainers

Scalable and memory-optimized training of diffusion models

Python 1,358 139 Updated Apr 8, 2026

zai-org / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,686 1,281 Updated Nov 4, 2025