Tiny Language Model (~1000 parameters)

[ARCHIVED] This repository is kept for historical reference. It represents an early experiment (~1394 parameters, NumPy/CuPy). The current generation neural network project is nn-v4 — a 1.5B-parameter GPT-style transformer.

Tiny Language Model (~1000 parameters)

A primitive character-level language model using NumPy with GPU acceleration (CuPy). Part of a learning progression: nn-v1 → nn-v2 → nn-v4.

Version	Params	Architecture	Framework	Notes
nn-v1 (this)	~1.4K	Embedding + MLP	NumPy/CuPy	Character-level; ultra-lightweight
nn-v2	~6.8K	Bigram + transformer block	PyTorch	Context window; GPU-accelerated
nn-v4	~1.45B	GPT decoder (24 layers)	PyTorch	Mixed precision, Flash Attention 2

Features

~1000 parameters: Ultra-lightweight neural network
GPU acceleration: Uses CuPy for NVIDIA GPU support (falls back to NumPy on CPU)
Character-level: Learns patterns at character granularity
Simple architecture: Embedding → Average Pooling → Hidden Layer (ReLU) → Output

Usage

�ash pip install -r requirements.txt python train.py # train + save model python generate.py # interactive text generation "@

= @"

[ARCHIVED] This repository is kept for historical reference. It represents an intermediate experiment (~6.8K parameters, PyTorch bigram model). The current generation neural network project is nn-v4 — a 1.5B-parameter GPT-style transformer.

Simple Text Inference Model

A minimal character-level bigram language model using PyTorch. Part of a learning progression: nn-v1 → nn-v2 → nn-v4.

Version	Params	Architecture	Framework	Notes
nn-v1	~1.4K	Embedding + MLP	NumPy/CuPy	Character-level; ultra-lightweight
nn-v2 (this)	~6.8K	Bigram + transformer block	PyTorch	8-char context window
nn-v4	~1.45B	GPT decoder (24 layers)	PyTorch	Mixed precision, Flash Attention 2

Files

model.py — BigramModel class
rain.py — Training script (creates �igram_model.pth)
generate.py — Interactive text generation
data.txt — Shakespeare training text

Usage

�ash pip install -r requirements.txt python train.py # train for 5000 iterations python generate.py # interactive generation "@

= @"

[ARCHIVED] This is a legacy copy of Andrej Karpathy's nanoGPT used for early experiments. The current generation neural network project is nn-v4 — a 1.5B-parameter GPT-style transformer built from scratch.

nanogpt-legacy

An early experiment following Andrej Karpathy's nanoGPT to understand GPT-style transformer training.

Key learnings from this stage fed into the design of nn-v4.

Learning Progression

Repo	What was learned
nn-v1	Forward pass, backprop by hand (NumPy)
nn-v2	PyTorch autograd, bigram model
nanogpt-legacy	Attention mechanism, transformer blocks
nn-v4	Full GPT at scale (1.5B params, The Pile, Flash Attention 2)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TRAINING_LOG.txt		TRAINING_LOG.txt
config.py		config.py
generate_samples.py		generate_samples.py
model.py		model.py
requirements.txt		requirements.txt
t8.shakespear.txt		t8.shakespear.txt
tiny_lm_model.npz		tiny_lm_model.npz
tokenizer.json		tokenizer.json
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tiny Language Model (~1000 parameters)

Features

Usage

Simple Text Inference Model

Files

Usage

nanogpt-legacy

Learning Progression

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tiny Language Model (~1000 parameters)

Features

Usage

Simple Text Inference Model

Files

Usage

nanogpt-legacy

Learning Progression

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages