Uniconn is a unified, portable high-level C++ communication library that supports both point-to-point and collective operations across GPU clusters. Uniconn enables seamless switching between backe…

Cuda 3 Updated Dec 17, 2025

federico-busato / Modern-CPP-Programming

Modern C++ Programming Course (C++03/11/14/17/20/23/26)

HTML 14,950 1,063 Updated Apr 19, 2026

HexadigmSystems / FunctionTraits

Professionally written C++ function traits library (single header-only) for retrieving info about any function (arg types, arg count, return type, etc.)

C++ 49 6 Updated Sep 3, 2025

ROCm / iris

AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

Python 188 39 Updated Apr 30, 2026

perplexityai / pplx-garden

Perplexity open source garden for inference technology

Rust 404 39 Updated Dec 25, 2025

HicrestLaboratory / Blink-GPU

A suite of microbenchmarks developed for systems with multi-GPU per node.

Cuda 9 3 Updated Jan 22, 2026

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 507 93 Updated Apr 30, 2026

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,957 538 Updated Apr 30, 2026

ROCm / mori

Modular RDMA Interface

C++ 119 37 Updated May 1, 2026

meta-pytorch / torchcomms

torchcomms: a modern PyTorch communications API

C++ 358 137 Updated Apr 30, 2026

hpdps-group / COCCL

Forked from NVIDIA/nccl

COCCL: Compression and precision co-aware collective communication library

C++ 30 3 Updated Mar 16, 2025

Mantevo / miniAMR

MiniAMR Adaptive Mesh Refinement (AMR) Mini-App

C 39 26 Updated Nov 12, 2024

osayamenja / FlashMoE

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 249 34 Updated Apr 27, 2026

AIComputing101 / gpu-programming-101

A comprehensive hands-on project for learning GPU programming with CUDA and HIP, covering fundamental concepts through advanced optimization techniques.

C++ 35 3 Updated Nov 20, 2025

NVIDIA / nvshmem

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 519 76 Updated Apr 28, 2026

afshinea / stanford-cme-295-transformers-large-language-models

VIP cheatsheet for Stanford's CME 295 Transformers and Large Language Models

4,371 616 Updated Jul 27, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 9,592 1,217 Updated Apr 29, 2026

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 570 86 Updated Nov 7, 2025

ROCm / rocSHMEM

[DEPRECATED] Moved to ROCm/rocm-systems repo

C++ 145 44 Updated Apr 24, 2026

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 883 149 Updated Sep 26, 2025

llnl / RAJA

RAJA Performance Portability Layer (C++)

C++ 577 111 Updated Apr 30, 2026

jarro2783 / cxxopts

Lightweight C++ command line option parser

C++ 4,756 641 Updated Apr 29, 2026

cpp-best-practices / cppbestpractices

Collaborative Collection of C++ Best Practices. This online resource is part of Jason Turner's collection of C++ Best Practices resources. See README.md for more information.

8,731 907 Updated Aug 6, 2024

cpp-best-practices / cmake_template

CMake for C++ Best Practices

CMake 1,742 204 Updated Apr 15, 2026

francois-rozet / dawgz

Unleash the true power of scheduling

Python 35 3 Updated Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doğan Sağbili dogansagbili

Highlights

Organizations

Block or report dogansagbili

Lists (1)

🔮 Future ideas

Stars

NVIDIA / nccl

eth-cscs / COSMA

toddmaustin / research-prompts

burtscher / SLEEK

meta-pytorch / kraken

ParCoreLab / Uniconn