wx-csy

👋

bonjour

Shaoyuan CHEN wx-csy

👋

bonjour

MadSys@Tsinghua

157 followers · 61 following

Tsinghua University
Beijing, China

Achievements

x3 x2

Achievements

x3 x2

Organizations

Starred repositories

bertdobbelaere / SorterHunter

An evolutionary approach to find small and low latency sorting networks

HTML 80 10 Updated Feb 22, 2026

deepseek-ai / DeepSeek-V3.2-Exp

Python 1,577 162 Updated Nov 18, 2025

andreas-abel / nanoBench

A tool for running small microbenchmarks on recent Intel and AMD x86 CPUs.

Python 515 67 Updated Mar 29, 2026

NVIDIA / cuda-python

CUDA Python: Performance meets Productivity

Cython 3,231 276 Updated Apr 28, 2026

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,840 1,038 Updated Mar 30, 2026

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,365 201 Updated Mar 24, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,152 146 Updated Mar 21, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,951 322 Updated Jan 14, 2026

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,125 946 Updated Apr 24, 2026

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 9,578 1,208 Updated Apr 28, 2026

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,604 1,015 Updated Apr 27, 2026

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,984 287 Updated May 15, 2025

MoonshotAI / MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 2,105 142 Updated Apr 3, 2025

tinygrad / open-gpu-kernel-modules

Forked from NVIDIA/open-gpu-kernel-modules

NVIDIA Linux open GPU with P2P support

C 1,360 139 Updated Jun 6, 2025

gkamradt / LLMTest_NeedleInAHaystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 2,275 242 Updated Aug 17, 2024

aliireza / ddio-bench

Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks

Makefile 102 22 Updated Sep 2, 2021

NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,373 185 Updated Mar 12, 2026

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 17,086 1,277 Updated Apr 27, 2026

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,219 714 Updated Apr 28, 2026

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,952 272 Updated Apr 22, 2026

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,644 1,823 Updated Apr 25, 2026

huggingface / text-generation-inference

Large Language Model Text Generation Inference

Python 10,847 1,263 Updated Mar 21, 2026

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 78,464 16,202 Updated Apr 28, 2026

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,414 935 Updated Mar 27, 2024

google / rust-crate-audits

267 12 Updated Mar 29, 2026

meta-llama / llama

Inference code for Llama models

Python 59,375 9,818 Updated Jan 26, 2025

intel / pcm

Intel® Performance Counter Monitor (Intel® PCM)

C++ 3,262 523 Updated Apr 13, 2026

yaobaiwei / Grasper

Grasper: A High Performance Distributed System for OLAP on Property Graphs.

C++ 30 9 Updated Apr 3, 2021

ciaranm / glasgow-subgraph-solver

A solver for subgraph isomorphism problems, based upon a series of papers by subsets of McCreesh, Prosser, and Trimble.

C++ 101 28 Updated Apr 3, 2026

ciaranm / cp2015-subgraph-isomorphism

CP 2015 subgraph isomorphism experiments, data and paper

C++ 13 5 Updated Sep 5, 2015

Shaoyuan CHEN wx-csy

Organizations

Starred repositories

program-synthesis