- Beijing
Lists (1)
Sort Name ascending (A-Z)
Stars
The agent that grows with you
Claude Code 泄露源码 - 本地可运行版本,新增跨平台桌面端软件补齐Computer Use(附带核心模块解析)
OpenClaw-RL: Train any agent simply by talking
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
SkyRL: A Modular Full-stack RL Library for LLMs
[ICLR 2026] Tree Search for LLM Agent Reinforcement Learning
Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping" by Zhiheng Xi et al.
Tongyi Deep Research, the Leading Open-source Deep Research Agent
[ICLR 2026] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents
[ICLR 2026] Agentic Reinforced Policy Optimization (ARPO)
[ICLR 2026]QeRL enables RL for 32B LLMs on a single H100 GPU.
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
AI memory OS for LLM and Agent systems(moltbot,clawdbot,openclaw), enabling persistent Skill memory for cross-task skill reuse and evolution.
slime is an LLM post-training framework for RL Scaling.
A high-throughput and memory-efficient inference and serving engine for LLMs
Scalable toolkit for efficient model reinforcement
Muon is an optimizer for hidden layers in neural networks
Unleashing the Power of Reinforcement Learning for Math and Code Reasoners
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
FireFlyer Record file format, writer and reader for DL training samples.
The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.
Official Repo for Open-Reasoner-Zero