Starred repositories
[ICLR'26] AutoGEO: a Generative Engine Optimization framework to automatically learn generative engine preferences, and rewrite web contents for more traction.
Web UI for training and running open models like Gemma 4, Qwen3.5, DeepSeek, gpt-oss locally.
A Claude Code SKILL for designing beautiful, consistent web pages — spec first, code second.
Build, Evaluate, and Deploy GUI Agents — online RL training, standardized benchmarks, and real-device deployment in one framework.
Efficient and unified implementations for TopK-based sparse attention
offline agentic workflow for generating ad creatives
Industrial-grade video recommendation system: Two-Tower recall + Faiss + DeepFM/DIN ranking
Trained a two-tower ranker and distilled it into a student ranker. Reranking to enforce item diversity, category diversity/coverage, and exposure fairness for categories/new items. Found the best s…
Feature engineering for big data and quick inference
CoreIR project: Reproduction of "Query Auto-Completion for Rare Prefixes"
Virtually, every modern search application features some kind of query auto completion. In its basic form, the problem consists in retrieving from a string set a small number of completions i.e. st…
This project is a feature-rich, responsive clone of the Netflix UI, supercharged with an AI-powered movie recommendation engine. It allows users to browse popular and trending movies, watch trailer…
CroQS: a Benchmark for Cross-modal Query Suggestion
Simulation-based Interactive Query Suggestion Evaluation
闲鱼自动回复管理系统是一个基于 Python + FastAPI 开发的自动化客服系统,专为闲鱼平台设计。系统通过 WebSocket 连接闲鱼服务器,实时接收和处理消息,提供智能化的自动回复服务。
빅테크 추천시스템 스터디 (Meta HSTU, MS Recommenders, NVIDIA recsys-examples, Youtube)
Use your Claude Max subscription with OpenCode, Pi, Droid, Aider, Crush, Cline. Proxy that bridges Anthropic's official SDK to enable Claude Max in third-party tools.
A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.
Custom LLM inference kernels in Triton & CUDA C++: Flash Attention (beats torch SDPA at seqlen≥1024), int8 GEMM+dequant (104% of fp16 cuBLAS at M=2048), fused RMSNorm+Linear. Benchmarked on NVIDIA …
SGLang speculative decoding on 4× NVIDIA A30 — implemented lossless draft/verify with a RadixAttention-safe provisional KV cache (insert/commit/evict), accept-reject sampling verified mathematicall…
1.95× faster LLM inference via compiler-level kernel fusion