antferdom

Follow

A.J antferdom

Follow

engineer

59 followers · 560 following

@datacrunch-research
Seville, Spain
05:56 (UTC +02:00)
@antferdom

Achievements

Achievements

Highlights

Pro

Lists (32)

Sort

AI Efficiency

501 repositories

Attention

62 repositories

Checkpointing

29 repositories

Collective Communication

58 repositories

Compilers

143 repositories

CUDA

264 repositories

Datasets

56 repositories

Diffusion

256 repositories

Distributed Systems

175 repositories

Graphs

13 repositories

gRPC

20 repositories

Hardware

83 repositories

HPC

93 repositories

Infrastructure as Code

33 repositories

Jax

51 repositories

Job Orchestration

20 repositories

K8s

45 repositories

Kernel Language

152 repositories

Language Models

521 repositories

Linux

35 repositories

Mechanistic Interpretability

31 repositories

NVIDIA GDS: DMA/RDMA

36 repositories

PEFT

16 repositories

PyTorch

293 repositories

Quantization

57 repositories

Research Tools

282 repositories

Rust Machine Learning Ecosystem

40 repositories

Serving

187 repositories

Simulation

48 repositories

Storage

20 repositories

Vision

88 repositories

WASM

28 repositories

Starred repositories

patrick-toulme / pyptx

A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch

Python 99 6 Updated Apr 25, 2026

KuangjuX / ncu-cli

Automated CUDA kernel performance diagnostics from NVIDIA Nsight Compute (NCU) CSV exports.

Rust 30 2 Updated Mar 18, 2026

Tencent-Hunyuan / Hy3-preview

Hy3 preview (295B A21B), a leading reasoning and agent model in its size, with great cost efficiency

Python 235 9 Updated Apr 23, 2026

BBuf / AI-Infra-Auto-Driven-SKILLS

Python 108 6 Updated Apr 25, 2026

deepseek-ai / TileKernels

A kernel library written in tilelang

Python 1,169 89 Updated Apr 23, 2026

luminal-ai / luminal

Deep learning at the speed of light.

Rust 2,809 201 Updated Apr 26, 2026

leepoly / sm-profiler

Python 65 6 Updated Feb 5, 2026

facebookresearch / tensor-layouts

A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.

Python 171 10 Updated Apr 23, 2026

SemiAnalysisAI / microbench-blackwell

Python 83 16 Updated Apr 17, 2026

inclusionAI / humming

Python 72 4 Updated Apr 25, 2026

tsinghua-ideal / flash-topk-attention

Efficient and unified implementations for TopK-based sparse attention

Cuda 35 Updated Apr 20, 2026

inclusionAI / cuLA

CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.

Python 474 50 Updated Apr 24, 2026

traceroot-ai / traceroot

TraceRoot - open-source observability and self-healing layer for AI agents. YC S25

TypeScript 525 111 Updated Apr 26, 2026

florianmattana / sass-king

Reverse engineering NVIDIA SASS instruction dictionary, kernel audits and pattern recognition across GPU architectures.

Sass 186 10 Updated Apr 24, 2026

eunomia-bpf / nccl-eBPF

C 12 1 Updated Mar 12, 2026

tokenbender / nanogpt-attnres-repro

Comparative study and experimentation on standard vs mHC vs attention residual (full and block)

Python 13 2 Updated Mar 17, 2026

technillogue / ptx-isa-markdown

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 182 33 Updated Dec 24, 2025

openai / parameter-golf

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 4,961 3,291 Updated Apr 24, 2026

bgdnvk / clanker

autonomous systems engineering cli agent for any cloud environment: AWS, GCP, Cloudflare, etc

Go 281 15 Updated Apr 23, 2026

SandAI-org / MagiCompiler

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 299 23 Updated Apr 23, 2026

KuangjuX / cuda-evolve-oss

Autonomous GPU kernel optimization system driven by AI agents.

Python 31 Updated Mar 29, 2026

chen-hao-chao / mdm-prime-v2

MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models

Python 26 3 Updated Apr 19, 2026

lightseekorg / smg

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat histor…

Rust 186 53 Updated Apr 25, 2026

THUDM / IndexCache

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

87 8 Updated Mar 14, 2026

geohot / nanocode

Forked from 1rgs/nanocode

Minimal Claude Code alternative. Single Python file, zero dependencies, ~250 lines.

Python 33 2 Updated Jan 14, 2026

tinygrad / tinyspec

A spec for tinygrad (let's see if this works)

TeX 37 Updated Apr 22, 2026

google-deepmind / simply

Minimal and scalable research codebase in JAX, designed for rapid iteration on frontier research in LLM and other autoregressive models.

Python 520 60 Updated Apr 26, 2026

karpathy / autoresearch

AI agents running research on single-GPU nanochat training automatically

Python 76,633 11,171 Updated Mar 26, 2026

facebookresearch / gcm

GPU Cluster Monitoring (GCM): Large-Scale AI Research Cluster Monitoring

Python 221 35 Updated Apr 24, 2026

tanishqkumar / ssd

A lightweight inference engine supporting speculative speculative decoding (SSD).

Python 895 63 Updated Mar 22, 2026

Starred topics

Coq

Homebrew