Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
-
Updated
Oct 14, 2025 - Cuda
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake using PyBind11
Multicore programming course materials (Spring 2021)
👾 CUDA examples for deep learning problems
Sample codes for parallel programming using OpenMP on CPU and CUDA on GPU
A general cubic equation solver and quartic equation minimisation solver written for CPU and Nvidia GPUs, for more details and results, see: https://arxiv.org/abs/1903.10041. The library is available for C++/CUDA as well as Python using Pybind11.
C++ implementation of a neural network using OpenMP and CUDA for parallelization.
K-Means clustering implementation with parallelization using OpenMP and CUDA for efficient computation.
Heterogeneous Parallel implementation to solve the Connected Components problem using OpenMP, CUDA and OpenCL.
Mean Shift C++17 implementations: Sequential, OpenMP and CUDA
This repo is to solve the all-pairs shortest path problem with CPU threads and then further accelerate the program with CUDA accompanied by Blocked Floyd-Warshall algorithm
Kmeans and DBSCAN CUDA/OpenMP parallel implementations.
Parallel programming for Merge sort algorithm using OpenMP and CUDA
Parallel EWH
Project for the course on Parallel and Distributed Computing, with implementations of Multilayer Game of Life in OpenMP, Rust and Cuda.
Image filtering in C++, OpenMP and CUDA
This project implements Causal Multi-Head Self-Attention (CMHSA) with three backends: a single-threaded CPU version, a multi-threaded CPU version, and a GPU-accelerated version.
Add a description, image, and links to the openmp topic page so that developers can more easily learn about it.
To associate your repository with the openmp topic, visit your repo's landing page and select "manage topics."