Stars
Repository to host and maintain SCALE-Sim code
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1
[ACL Findings 2026] Official Implementation of "FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration"
[ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
A curated list for Efficient Large Language Models
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (p…
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
Google AI 2018 BERT pytorch implementation
Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
CUDA Templates and Python DSLs for High-Performance Linear Algebra
This is a BNN_Kernel on PyTorch for 1-bit networks in image data processing
IBM Research CUDA Implementation for the H2O version of the LightGBM package (v2.2.4)
Accompanying code for the paper "Zero-shot Knowledge Transfer via Adversarial Belief Matching"
Knowledge Extraction with No Observable Data (NeurIPS 2019)
Efficient computing methods developed by Huawei Noah's Ark Lab
Training neural networks with back-prop, feedback-alignment and direct feedback-alignment
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Models and examples built with TensorFlow
Binarized Neural Network (BNN) for pytorch