Mini-Tensor v0.1 – PyTorch-style tensor library in C++ with CUDA + IR tracing

This is the first stable release of mini-tensor, a PyTorch-inspired C++ tensor library designed for systems-level learning and performance experimentation.

🔧 Key Features

2D and 3D tensors with slicing, reshaping, and broadcasting
Arithmetic and matrix ops (manual, Eigen, and CUDA-based)
Neural network layers: Linear, ReLU, Softmax, Sequential
IR tracing system with named tensor IDs and introspection
CUDA support for mat_mul_cuda() and bmm_cuda()
Fused CUDA kernel: bmm_add_cuda() for matmul + bias
Benchmarks, reproducible builds, and device-safe memory

📊 Performance

Up to 600×–700× speedup on batched matmul (GPU vs CPU)
Benchmarks shown in README and demo.md

🧠 Why This Exists

Built as a solo project to explore inference system internals, GPU kernel integration, and forward-pass execution from first principles.

🔗 Full code, tests, and benchmarks: README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mini-Tensor v0.1 – PyTorch-style tensor library in C++ with CUDA + IR tracing

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🔧 Key Features

📊 Performance

🧠 Why This Exists

Uh oh!