Stars
A curated list of papers, tools, and resources on Multi-Token Prediction (MTP) and related techniques in Large Language Models (LLMs), Speech-Language Models (SLMs), and more.
RiddleHe / nanochat
Forked from karpathy/nanochatThe best ChatGPT that $100 can buy.
Train transformer language models with reinforcement learning.
A curated list of awesome Go frameworks, libraries and software
A Curated List of Vision-Language-Action (VLA) and World Action Models (WAM) Research and Beyond
SGLang is a high-performance serving framework for large language models and multimodal models.
FrontierSWE is an ultra long-horizon coding agent benchmark that tests implementation, performance eng and ML research
Reference code for the Meta-Harness paper.
🌱 A little course on Reinforcement Learning Environments for evaluating and training Language Models
250+ Fine-tuning & RL Notebooks for text, vision, audio, embedding, TTS models.
Competitive GPU kernel optimization platform.
🚀 Efficient implementations for emerging model architectures
Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM
Flash Attention in ~100 lines of CUDA (forward pass only)
A curated list of academic papers and resources on Physical AI — focusing on Vision-Language-Action (VLA) models, world models, embodied ai, and robotic foundation models.
A PyTorch implementation of the GPT-OSS-20B architecture. All components are coded from scratch: RoPE with YaRN, RMSNorm, SwiGLU with clamping and residual connection, Mixture-of-Experts (MoE), Sel…
Large Language Model (LLM) Systems Paper List
📚 A curated list of awesome articles, videos, and other resources to learn and practice software architecture, patterns, and principles.
A curated list to learn about distributed systems
rvLLM: High-performance LLM inference in Rust. Drop-in vLLM replacement.
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
GPU Engineering for AI Systems
Curated collection of AI inference engineering resources — LLM serving, GPU kernels, quantization, distributed inference, and production deployment. Compiled from the AER Labs community.