Skip to content
View antferdom's full-sized avatar

Highlights

  • Pro

Block or report antferdom

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

Python 99 6 Updated Apr 25, 2026

Automated CUDA kernel performance diagnostics from NVIDIA Nsight Compute (NCU) CSV exports.

Rust 30 2 Updated Mar 18, 2026

Hy3 preview (295B A21B), a leading reasoning and agent model in its size, with great cost efficiency

Python 235 9 Updated Apr 23, 2026

A kernel library written in tilelang

Python 1,169 89 Updated Apr 23, 2026

Deep learning at the speed of light.

Rust 2,809 201 Updated Apr 26, 2026
Python 65 6 Updated Feb 5, 2026

A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.

Python 171 10 Updated Apr 23, 2026
Python 72 4 Updated Apr 25, 2026

Efficient and unified implementations for TopK-based sparse attention

Cuda 35 Updated Apr 20, 2026

CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.

Python 474 50 Updated Apr 24, 2026

TraceRoot - open-source observability and self-healing layer for AI agents. YC S25

TypeScript 525 111 Updated Apr 26, 2026

Reverse engineering NVIDIA SASS instruction dictionary, kernel audits and pattern recognition across GPU architectures.

Sass 186 10 Updated Apr 24, 2026

Comparative study and experimentation on standard vs mHC vs attention residual (full and block)

Python 13 2 Updated Mar 17, 2026

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 182 33 Updated Dec 24, 2025

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 4,961 3,291 Updated Apr 24, 2026

autonomous systems engineering cli agent for any cloud environment: AWS, GCP, Cloudflare, etc

Go 281 15 Updated Apr 23, 2026

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 299 23 Updated Apr 23, 2026

Autonomous GPU kernel optimization system driven by AI agents.

Python 31 Updated Mar 29, 2026

MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models

Python 26 3 Updated Apr 19, 2026

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat histor…

Rust 186 53 Updated Apr 25, 2026

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

87 8 Updated Mar 14, 2026

Minimal Claude Code alternative. Single Python file, zero dependencies, ~250 lines.

Python 33 2 Updated Jan 14, 2026

A spec for tinygrad (let's see if this works)

TeX 37 Updated Apr 22, 2026

Minimal and scalable research codebase in JAX, designed for rapid iteration on frontier research in LLM and other autoregressive models.

Python 520 60 Updated Apr 26, 2026

AI agents running research on single-GPU nanochat training automatically

Python 76,633 11,171 Updated Mar 26, 2026

GPU Cluster Monitoring (GCM): Large-Scale AI Research Cluster Monitoring

Python 221 35 Updated Apr 24, 2026

A lightweight inference engine supporting speculative speculative decoding (SSD).

Python 895 63 Updated Mar 22, 2026
Next