LLM Training Project (~2GB Model)

Version History

This is the latest in a personal ML learning series:

Repo	Params	Key Concept
nn-v1	~1.4K	Backprop from scratch (NumPy)
nn-v2	~6.8K	PyTorch autograd, bigram LM
nanogpt-legacy	~10M	Attention, transformer blocks
nn-v4 (this)	~1.45B	Full GPT at scale — mixed precision, Flash Attention 2

A from-scratch implementation of a GPT-style transformer language model with ~1.5B parameters (2GB on disk).

Model Specifications

Architecture: Decoder-only Transformer (GPT-style)
Parameters: ~1.45 billion
Layers: 24
Hidden Dimension: 1536
Attention Heads: 16
Context Length: 2048 tokens
Vocabulary: 50,257 (GPT-2 tokenizer)

Features

Mixed precision training (FP16/BF16)
Gradient checkpointing for memory efficiency
Flash Attention 2 support
Robust checkpoint/resume system
Gradient accumulation for large effective batch sizes
Automatic interrupt handling (SIGINT/SIGTERM)
Weights & Biases / TensorBoard logging
Single GPU optimized

Quick Start

Installation

�ash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt

Training

�ash python scripts/train.py # start from scratch python scripts/train.py --resume # resume from checkpoint

Training Details

Dataset: The Pile (300B tokens)
Effective Batch Size: 128-512 tokens
Learning Rate: 6e-4 with cosine decay
Optimizer: AdamW
Expected Duration: 2-6 months on single GPU

GPU Requirements

Minimum: 16GB VRAM (RTX 4080, A4000)
Recommended: 24GB VRAM (RTX 4090, A100 40GB)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
configs		configs
data		data
evaluation		evaluation
models		models
scripts		scripts
training		training
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
QUANTIZATION_INFO.py		QUANTIZATION_INFO.py
README.md		README.md
requirements.txt		requirements.txt
test_model.py		test_model.py
test_precision.py		test_precision.py
test_training.py		test_training.py
train_with_inference.py		train_with_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Training Project (~2GB Model)

Version History

Model Specifications

Features

Quick Start

Installation

Training

Training Details

GPU Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Training Project (~2GB Model)

Version History

Model Specifications

Features

Quick Start

Installation

Training

Training Details

GPU Requirements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages