Lists (1)
Sort Name ascending (A-Z)
Stars
Optimized primitives for collective multi-GPU communication
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
AI prompts for accelerating the research workflow.
Triton-based Symmetric Memory operators and examples
Uniconn is a unified, portable high-level C++ communication library that supports both point-to-point and collective operations across GPU clusters. Uniconn enables seamless switching between backe…
Modern C++ Programming Course (C++03/11/14/17/20/23/26)
Professionally written C++ function traits library (single header-only) for retrieving info about any function (arg types, arg count, return type, etc.)
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
Perplexity open source garden for inference technology
A suite of microbenchmarks developed for systems with multi-GPU per node.
MSCCL++: A GPU-driven communication stack for scalable AI applications
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
torchcomms: a modern PyTorch communications API
hpdps-group / COCCL
Forked from NVIDIA/ncclCOCCL: Compression and precision co-aware collective communication library
Distributed MoE in a Single Kernel [NeurIPS '25]
A comprehensive hands-on project for learning GPU programming with CUDA and HIP, covering fundamental concepts through advanced optimization techniques.
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
VIP cheatsheet for Stanford's CME 295 Transformers and Large Language Models
DeepEP: an efficient expert-parallel communication library
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Collaborative Collection of C++ Best Practices. This online resource is part of Jason Turner's collection of C++ Best Practices resources. See README.md for more information.
CMake for C++ Best Practices