imShZh

🎯

Focusing

Zhihao Shen imShZh

🎯

Focusing

A full-stack SE & AI enthusiast. Major in CS and want to involve in computer system. Another account @Sh-Zh-7

19 followers · 16 following

Tsinghua University
Zhejiang, China
18:00 (UTC +08:00)
https://shzh.wiki

Achievements

Organizations

Lists (8)

Sort

Stars

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 4,530 847 Updated Apr 28, 2026

PrimeIntellect-ai / prime-rl

Agentic RL Training at Scale

Python 1,323 272 Updated Apr 28, 2026

radixark / miles

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 1,166 172 Updated Apr 28, 2026

perplexityai / pplx-garden

Perplexity open source garden for inference technology

Rust 401 38 Updated Dec 25, 2025

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

Python 9,424 927 Updated Apr 28, 2026

alibaba / ROLL

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 3,111 275 Updated Apr 28, 2026

NVIDIA / dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

Go 1,697 281 Updated Apr 7, 2026

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 42,210 4,808 Updated Apr 24, 2026

inclusionAI / AReaL

The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.

Python 5,110 485 Updated Apr 28, 2026

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 5,508 755 Updated Apr 28, 2026

NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,373 185 Updated Mar 12, 2026

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes for ML SYS.

Python 6,141 401 Updated Apr 23, 2026

ColfaxResearch / cfx-article-src

C++ 186 34 Updated May 7, 2025

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

C++ 4,651 1,222 Updated Apr 27, 2026

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 5,276 799 Updated Apr 28, 2026

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,501 366 Updated Apr 13, 2026

srush / LLM-Training-Puzzles

What would you do with 1000 H100s...

Jupyter Notebook 1,169 72 Updated Jan 10, 2024

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 6,311 522 Updated Apr 27, 2026

SiriusNEO / Triton-Puzzles-Lite

Puzzles for learning Triton, play it with minimal environment configuration!

Python 682 95 Updated Mar 17, 2026

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,214 711 Updated Apr 28, 2026

NVIDIA / cuda-python

CUDA Python: Performance meets Productivity

Cython 3,229 276 Updated Apr 28, 2026

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,062 87 Updated Sep 4, 2024

InternLM / turbomind

C++ 98 9 Updated Mar 26, 2025

MDK8888 / vllmini

A minimal implementation of vllm.

Cuda 71 Updated Jul 27, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 5,520 937 Updated Apr 28, 2026

zai-org / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,687 1,281 Updated Nov 4, 2025

excalidraw / excalidraw

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 122,099 13,449 Updated Apr 27, 2026

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,327 276 Updated Apr 25, 2026

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 6,021 605 Updated Apr 22, 2026

NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,950 1,522 Updated Apr 27, 2026

Zhihao Shen imShZh

Organizations

Lists (8)

Inference

Kernel

LLM

ML compiler

Quant

SD

Serving

Training

Stars