Skip to content
View dblate's full-sized avatar
🐢
AI Infrastructure
🐢
AI Infrastructure
  • Baidu
  • Beijing, China

Block or report dblate

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

GLM-5: From Vibe Coding to Agentic Engineering

3,050 325 Updated Apr 17, 2026

强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/

Jupyter Notebook 14,067 2,247 Updated Dec 30, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,627 1,811 Updated Apr 21, 2026

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Python 1,062 146 Updated Apr 22, 2026

The awesome collection of OpenClaw skills. 5,400+ skills filtered and categorized from the official OpenClaw Skills Registry.🦞

47,081 4,625 Updated Apr 20, 2026

AI-powered, vision-driven UI automation for every platform.

TypeScript 12,784 952 Updated Apr 24, 2026

Midscene connector for pc,include local pc and remote pc server. Supports windows/linux/macos.基于midscene的跨平台PC桌面端自动化操作代理。同时支持本地和远程桌面操作方式。

TypeScript 44 6 Updated Jan 27, 2026

Paper list in the survey: A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

521 19 Updated Jul 3, 2025

[ACL 2026] Towards Efficient Large Language Model Serving: A Survey on System-Aware KV Cache Optimization

Python 266 14 Updated Apr 21, 2026

Democratizing Reinforcement Learning for LLMs

Python 5,447 546 Updated Apr 23, 2026

SpotServe: Serving Generative Large Language Models on Preemptible Instances

134 16 Updated Feb 22, 2024

Persist and reuse KV Cache to speedup your LLM.

Python 272 73 Updated Apr 24, 2026

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 24,106 2,770 Updated Mar 12, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,470 251 Updated Apr 15, 2026

Offline optimization of your disaggregated Dynamo graph

Python 274 104 Updated Apr 24, 2026

Nano vLLM

Python 13,105 1,988 Updated Apr 13, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 4,054 594 Updated Mar 13, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,665 517 Updated Apr 23, 2026

A Simplified PyTorch Implementation of Vision Transformer (ViT)

Jupyter Notebook 252 43 Updated Jun 10, 2024
C++ 58 61 Updated Apr 24, 2026

Let's Learn AI SYStem

Python 40 240 Updated Jan 30, 2026

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 799 214 Updated Apr 2, 2026

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

Go 431 74 Updated Apr 23, 2026

Puzzles for learning Triton

Jupyter Notebook 2,397 220 Updated Apr 1, 2026

The best ChatGPT that $100 can buy.

Python 52,430 6,986 Updated Apr 14, 2026

Uniform Manifold Approximation and Projection

Python 8,157 863 Updated Apr 18, 2026

A workload for deploying LLM inference services on Kubernetes

Go 208 54 Updated Apr 24, 2026

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 943 83 Updated Feb 28, 2026

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…

281 15 Updated Mar 6, 2025

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an LLM (with low latency overhead!)

Jupyter Notebook 50 8 Updated Jun 1, 2024
Next