Skip to content
View BlackSamorez's full-sized avatar
💭
Locked In
💭
Locked In

Highlights

  • Pro

Block or report BlackSamorez

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Vue 1 Updated Apr 6, 2026

Python package for LLM compression

Python 326 10 Updated Apr 24, 2026

Quartet II Official Code

Python 69 8 Updated Mar 23, 2026
Python 107 19 Updated Feb 26, 2026

An iOS app that integrates a Large Language Model (LLM) to process audio recordings for transcription and summarization.

C++ 17 2 Updated Nov 29, 2024

QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning

C++ 178 21 Updated Nov 11, 2025
Jupyter Notebook 121 13 Updated Mar 18, 2026

First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)

Python 55 4 Updated Mar 7, 2026

Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.

TypeScript 61,000 6,279 Updated Apr 25, 2026

Code for the EMNLP 2024 paper "Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on LLMs".

Python 8 Updated Jun 18, 2024
Python 168 19 Updated Jun 22, 2025

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Python 155 24 Updated Aug 21, 2025

Technical Note: From C++98 to C++2x

147 12 Updated Jun 8, 2025

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

Python 121 5 Updated Mar 6, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,061 86 Updated Sep 4, 2024

QuIP quantization

Python 64 6 Updated Mar 17, 2024

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,318 195 Updated Feb 26, 2026
Go 5 Updated Feb 18, 2024

Friends don't let friends make certain types of data visualization - What are they and why are they bad.

R 7,047 286 Updated Sep 3, 2025
Python 590 50 Updated Oct 29, 2024

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,294 195 Updated Mar 27, 2024

Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

Python 280 24 Updated Nov 3, 2023

Meditron is a suite of open-source medical Large Language Models (LLMs).

Python 2,165 209 Updated Apr 10, 2024

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,332 83 Updated Mar 6, 2025

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024

C++ 186 13 Updated Apr 16, 2024

💎A site, that contains systematic optimization methods and theory review

Jupyter Notebook 133 110 Updated Apr 6, 2026

distributed trainer for LLMs

Python 589 84 Updated May 20, 2024

Minimalist ML framework for Rust

Rust 20,089 1,540 Updated Apr 25, 2026

This repository is the official implementation of 'EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning' (ICML 2022).

Jupyter Notebook 14 1 Updated Aug 2, 2022
Next