Skip to content
View imShZh's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@ShZh-Playground @ShZh-libraries @ShZh-websites

Block or report imShZh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A framework for efficient model inference with omni-modality models

Python 4,530 847 Updated Apr 28, 2026

Agentic RL Training at Scale

Python 1,323 272 Updated Apr 28, 2026

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 1,166 172 Updated Apr 28, 2026

Perplexity open source garden for inference technology

Rust 401 38 Updated Dec 25, 2025

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

Python 9,424 927 Updated Apr 28, 2026

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 3,111 275 Updated Apr 28, 2026

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

Go 1,697 281 Updated Apr 7, 2026

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 42,210 4,808 Updated Apr 24, 2026

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,110 485 Updated Apr 28, 2026

slime is an LLM post-training framework for RL Scaling.

Python 5,508 755 Updated Apr 28, 2026

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,373 185 Updated Mar 12, 2026

My learning notes for ML SYS.

Python 6,141 401 Updated Apr 23, 2026

Optimized primitives for collective multi-GPU communication

C++ 4,651 1,222 Updated Apr 27, 2026

A PyTorch native platform for training generative AI models

Python 5,276 799 Updated Apr 28, 2026

NCCL Tests

Cuda 1,501 366 Updated Apr 13, 2026

What would you do with 1000 H100s...

Jupyter Notebook 1,169 72 Updated Jan 10, 2024

Efficient Triton Kernels for LLM Training

Python 6,311 522 Updated Apr 27, 2026

Puzzles for learning Triton, play it with minimal environment configuration!

Python 682 95 Updated Mar 17, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,214 711 Updated Apr 28, 2026

CUDA Python: Performance meets Productivity

Cython 3,229 276 Updated Apr 28, 2026

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,062 87 Updated Sep 4, 2024
C++ 98 9 Updated Mar 26, 2025

A minimal implementation of vllm.

Cuda 71 Updated Jul 27, 2024

FlashInfer: Kernel Library for LLM Serving

Python 5,520 937 Updated Apr 28, 2026

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,687 1,281 Updated Nov 4, 2025

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 122,099 13,449 Updated Apr 27, 2026

Tile primitives for speedy kernels

Cuda 3,327 276 Updated Apr 25, 2026

Material for gpu-mode lectures

Jupyter Notebook 6,021 605 Updated Apr 22, 2026

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,950 1,522 Updated Apr 27, 2026
Next