Highlights
- Pro
Stars
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
The first unified, efficient, and extensible evaluation toolkit for evaluating image generation and editing models across multiple benchmarks.
Arxiv 25: Dynamic Pyramid Network for Efficient Multimodal Large Language Model
IamCreateAI / FlowCPS
Forked from yifan123/flow_grpoAn official implementation of Coefficients-Preserving Sampling for Reinforcement Learning with Flow Matching
SoulX-FlashTalk is the first 14B model to achieve sub-second start-up latency (0.87s) while maintaining a real-time throughput of 32 FPS on an 8xH800 node.
[Arxiv 2023] img2img version of stable diffusion. Line Art Automatic Coloring. Anime Character Remix. Style Transfer.
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
A PyTorch-native inference engine with cache, parallelism, quantization for Diffusion Transformers.
[ICLR 2026] ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
[CVPR 2026] 🔥🔥 Official Repo of UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model
Consistency Distillation with Target Timestep Selection and Decoupled Guidance
Light Image Video Generation Inference Framework
This is the official implementation of our Señorita-2M [Weights and Dataset] : A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models