Skip to content
View hargup's full-sized avatar

Organizations

@AGV-IIT-KGP @metakgp @Azad-Hall

Block or report hargup

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

On-device Speech AI for Apple Silicon

Swift 6,023 549 Updated Apr 14, 2026

A hybrid programming language combining Lean4's formal verification with blazing-fast compilation, actor-based agent orchestration, AI-driven optimization, and vector-backed agent memory.

Rust 53 7 Updated Oct 25, 2025

amdgpu example code in hip/asm

C++ 59 29 Updated Apr 22, 2026

A skill for thinking

538 38 Updated Apr 13, 2026

🦄 ai that works - every tuesday 10 AM PST

TypeScript 1,739 131 Updated Apr 22, 2026

Machine Learning Engineering Open Book

Python 17,778 1,128 Updated Mar 16, 2026

🤗 smolagents: a barebones library for agents that think in code.

Python 26,843 2,515 Updated Apr 17, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,218 199 Updated Apr 19, 2026

Generating Efficient AI-Centric Kernels

Python 91 17 Updated Apr 23, 2026

Verified tensor graph optimization in Lean 4: constructive soundness proofs + equality saturation + verified extraction via e-graph↔circuit bijection + multi-target code generation.

Lean 3 Updated Mar 7, 2026
Lean 10 1 Updated Mar 2, 2026

Verified GPU programming framework for Lean 4. Write type-safe WebGPU shaders with formal verification, hardware-accelerated matrix ops, and cross-platform support (Metal/Vulkan/D3D12). Build prova…

Lean 22 1 Updated Apr 6, 2026

Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.

Cuda 19 1 Updated Feb 9, 2026

A lightweight multi-GPU inference engine for LLMs on mid/low-end GPUs.

Python 6 1 Updated Apr 2, 2026

cuGraph - RAPIDS Graph Analytics Library

Cuda 2,163 350 Updated Apr 23, 2026

KV Cache & LoRA for minGPT

Python 62 8 Updated Mar 4, 2026

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 17,047 1,270 Updated Apr 23, 2026

Heterogeneous GPU Sharing on Kubernetes

Go 3,353 534 Updated Apr 23, 2026
Coq 73 3 Updated May 29, 2019

hardware accelerator for deep convolutional neural networks

SystemVerilog 65 6 Updated Feb 25, 2026

Open-source CUDA compiler targeting multiple GPU architectures. Compiles .cu to AMD and Tenstorrent GPU's

C 1,654 81 Updated Mar 25, 2026

Assembler for NVIDIA Maxwell architecture

Sass 1,061 171 Updated Jan 3, 2023

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

Python 3,869 808 Updated Dec 23, 2020

An ARC-AGI solution using Agentica from Symbolica

Python 175 15 Updated Feb 12, 2026

ALMA (Automated meta-Learning of Memory designs for Agentic systems) is a framework that meta-learns memory designs to replace human-engineered designs for agentic system.

Python 202 23 Updated Apr 8, 2026

A huge collection of VHDL/Verilog open-source IP cores scraped from the web

593 171 Updated Jan 18, 2023

SQLite bindings for Lean

C 42 1 Updated Apr 23, 2026

[KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Python 159 18 Updated Mar 29, 2026

Multi-agent communication extension for pi coding agent

TypeScript 511 41 Updated Apr 4, 2026

A collection of GPU kernels and other experiments comparing Torch, Triton etc to Modular/Mojo

Jupyter Notebook 3 Updated Apr 7, 2026
Next