This is the first stable release of mini-tensor, a PyTorch-inspired C++ tensor library designed for systems-level learning and performance experimentation.
🔧 Key Features
- 2D and 3D tensors with slicing, reshaping, and broadcasting
- Arithmetic and matrix ops (manual, Eigen, and CUDA-based)
- Neural network layers:
Linear,ReLU,Softmax,Sequential - IR tracing system with named tensor IDs and introspection
- CUDA support for
mat_mul_cuda()andbmm_cuda() - Fused CUDA kernel:
bmm_add_cuda()for matmul + bias - Benchmarks, reproducible builds, and device-safe memory
📊 Performance
- Up to 600×–700× speedup on batched matmul (GPU vs CPU)
- Benchmarks shown in README and
demo.md
🧠 Why This Exists
Built as a solo project to explore inference system internals, GPU kernel integration, and forward-pass execution from first principles.
🔗 Full code, tests, and benchmarks: README