|
Hi, I am Shigeng Wang (王世耿). My research sits at the intersection of LLM efficiency and real-world deployment — working on quantization, compression, and hardware-aware inference to make foundation models faster and cheaper. 🧠 Researcher @ Intel Labs China |
Research Directions
- LLM Quantization & Compression — Post-training quantization, low-bit precision, layer-wise sensitivity analysis
- Efficient Inference — KVcache optimization, kernel fusion, hardware-aware deployment
Languages & Frameworks
Python, PyTorch, CUDA, C/C++
ML/AI Tools
vLLM, LLama.cpp (GGUF), OpenVINO, FlashAttention, PagedAttention
- 2021.09 – 2026.06, Ph.D. in Computer Science, Beijing University of Posts and Telecommunications
- 2017.09 – 2021.06, B.Eng. in Data Science and Big Data Technology, Beijing University of Posts and Telecommunications
- 2024.04 – now, Research Intern, Intel Labs China, Supervised by Anbang Yao
- 2023.10 – 2024.03, Research Intern, QCraft, Focusing on autonomous driving perception
📧 [email protected]
🔗 Learn more on my personal website: genggng.github.io