Skip to content
View kashif's full-sized avatar
  • Berlin, Germany
  • 08:45 (UTC +02:00)
  • X @krasul

Highlights

  • Pro

Block or report kashif

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🤗 ml-intern: an open-source ML engineer that reads papers, trains models, and ships ML models

Python 6,415 580 Updated Apr 25, 2026

Simple & Scalable Pretraining for Neural Architecture Research

Python 329 34 Updated Mar 31, 2026

Cisco Time Series Model is a continued pretrained time series forecasting model developed by Cisco.

Jupyter Notebook 26 3 Updated Apr 24, 2026

Target Policy Optimization (JAX)

Python 24 Updated Apr 18, 2026

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

Python 142 5 Updated Apr 18, 2026

Unified benchmarking framework for time series forecasting, comparing traditional and foundation models with automated pipelines and isolated execution.

Python 11 1 Updated Apr 24, 2026

Dissecting the Duck's Innards — A DuckDB-based course on the Design and Implementation of Database System Internals

C 328 8 Updated Apr 7, 2026

Train the smallest LM you can that fits in 16MB. Best model wins!

Python 17 1 Updated Mar 23, 2026

MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an excellent balance between performance and efficiency.

Python 35 1 Updated Mar 9, 2026

My Codex Skills

Shell 3,387 180 Updated Mar 29, 2026
C# 1 Updated Feb 7, 2026

dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models

Python 8 1 Updated Apr 10, 2026

Code for “Enhancing Diffusion-Based Sampling with Molecular Collective Variables"

Python 18 3 Updated Dec 17, 2025

A Lean formalisation of Maryna Viazovska's Fields Medal-winning solution to the sphere packing problem in dimension 8 and 24.

Lean 63 7 Updated Apr 7, 2026

Terminal Velocity Matching

Python 83 1 Updated Feb 14, 2026

Course website for 6.S184/6.S975: Generative AI with Stochastic Differential Equations

HTML 32 9 Updated Mar 18, 2026
Python 50 2 Updated Apr 1, 2026

Code for the papers: “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling” and “Adaptive Block-Scaled Data Types”

Python 171 17 Updated Apr 21, 2026

Course on Flash-attention in Triton

Jupyter Notebook 98 9 Updated Feb 9, 2026

Generic building-block toolbox for training neural networks with adaptive and recursive execution. It provides reusable components to control iteration, stopping, and unrolling during training, ena…

Python 27 Updated Feb 4, 2026

Official Implementation of pMF https://arxiv.org/abs/2601.22158

Python 215 12 Updated Feb 19, 2026

Official Implementation of "Meta Flow Maps enable scalable reward alignment"

Python 33 1 Updated Mar 14, 2026

The Student's Guide to @lintool

323 22 Updated Feb 11, 2026

Simple EEG tokenizer with PyTorch datasets.

Python 5 3 Updated Mar 4, 2026

[ICLR 2026] Official code for TraceRL: Revolutionizing post-training for Diffusion LLMs, powering the SOTA TraDo series.

Python 495 39 Updated Jan 28, 2026

[ICLR 2026] Discrete Diffusion Divergence Instruct (DiDi-Instruct)

Python 153 10 Updated Mar 4, 2026
Python 148 21 Updated Sep 29, 2025
Next