Stars
A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…
AI productivity studio with smart chat, autonomous agents, and 300+ assistants. Unified access to frontier LLMs
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards.
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Edit-R1: Reinforce Image Editing with Diffusion Negative-Aware Finetuning and MLLM Implicit Feedback
[ICLR 2026 Oral] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
A project page template for academic papers. Demo at https://eliahuhorwitz.github.io/Academic-project-page-template/
Code release for Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer
Some Conferences' accepted paper lists (including AI, ML, Robotic)
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
Extracted system prompts from ChatGPT (GPT-5.4, GPT-5.3, Codex), Claude (Opus 4.6, Sonnet 4.6, Claude Code), Gemini (3.1 Pro, 3 Flash, CLI), Grok (4.2, 4), Perplexity, and more. Updated regularly.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Awesome Unified Multimodal Models
[🚀ICML 2025] "Taming Rectified Flow for Inversion and Editing" Using FLUX and HunyuanVideo for image and video editing!
Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of-the-art methods, innovative applications, and key advanceme…