Skip to content
View Rongjiehuang's full-sized avatar
🎯
Focusing. I may be slow to reply.
🎯
Focusing. I may be slow to reply.

Organizations

@AIGC-Audio

Block or report Rongjiehuang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICLR 2026 Oral] ScaleCUA is the open-sourced computer use agents that can operate on cross-platform environments (Windows, macOS, Ubuntu, Android).

Python 1,108 78 Updated Jan 7, 2026

OpenClaw-RL: Train any agent simply by talking

Python 5,163 551 Updated Apr 28, 2026

MAI-UI: Real-World Centric Foundation GUI Agents ranging from 2B to 235B

Jupyter Notebook 1,794 178 Updated Apr 20, 2026

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 20,996 3,763 Updated Apr 29, 2026

GroundCUA

Python 125 14 Updated Mar 24, 2026

EvoCUA: Evolving Computer Use Agent

Python 314 21 Updated Mar 31, 2026

Open Claude Is Open-source coding-agent CLI for OpenAI, Gemini, DeepSeek, Ollama, Codex, GitHub Models, and 200+ models via OpenAI-compatible APIs.

TypeScript 24,881 8,099 Updated Apr 29, 2026

The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.

Rust 189,076 109,449 Updated Apr 28, 2026

This repository contains code and metadata of How2 dataset

Python 193 20 Updated Dec 30, 2024

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,481 310 Updated Jan 5, 2026

MAGI-1: Autoregressive Video Generation at Scale

Python 3,685 238 Updated Jun 17, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,783 249 Updated Dec 30, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 78,492 16,217 Updated Apr 29, 2026

EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL

Python 4,896 371 Updated Apr 6, 2026
Python 24 7 Updated Nov 26, 2025

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 482 31 Updated Apr 24, 2026

Large Concept Models: Language modeling in a sentence representation space

Python 2,349 206 Updated Jan 29, 2025

《机器阅读理解:算法与实践》代码

Python 157 59 Updated Jul 25, 2024

Build local voice agents with open-source models

Python 4,692 551 Updated Apr 28, 2026

Official implementation of "HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment"

Python 112 2 Updated Apr 15, 2025

An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation

Python 1,589 78 Updated Oct 16, 2025

Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.

Python 862 70 Updated Dec 26, 2025

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 2,259 155 Updated Apr 13, 2026

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,762 271 Updated Jul 18, 2025

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,837 211 Updated Apr 10, 2026

[ACM MM 2025] FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

Python 1,626 126 Updated Jan 26, 2026

Scalable and memory-optimized training of diffusion models

Python 1,358 139 Updated Apr 8, 2026

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,686 1,281 Updated Nov 4, 2025
Next