-
Facebook AI Research (FAIR)
- Menlo Park
- rongjiehuang.github.io
Stars
[ICLR 2026 Oral] ScaleCUA is the open-sourced computer use agents that can operate on cross-platform environments (Windows, macOS, Ubuntu, Android).
OpenClaw-RL: Train any agent simply by talking
MAI-UI: Real-World Centric Foundation GUI Agents ranging from 2B to 235B
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
Open Claude Is Open-source coding-agent CLI for OpenAI, Gemini, DeepSeek, Ollama, Codex, GitHub Models, and 200+ models via OpenAI-compatible APIs.
The repo is finally unlocked. enjoy the party! The fastest repo in history to surpass 100K stars ⭐. Join Discord: https://discord.gg/5TUQKqFWd Built in Rust using oh-my-codex.
This repository contains code and metadata of How2 dataset
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
MAGI-1: Autoregressive Video Generation at Scale
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
A high-throughput and memory-efficient inference and serving engine for LLMs
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
Large Concept Models: Language modeling in a sentence representation space
Build local voice agents with open-source models
Official implementation of "HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment"
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
Research code artifacts for Code World Model (CWM) including inference tools, reproducibility, and documentation.
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
[ACM MM 2025] FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
Scalable and memory-optimized training of diffusion models
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)