Lists (8)
Sort Name ascending (A-Z)
Stars
Use claude-code for free in the terminal, VSCode extension or discord like OpenClaw (voice supported)
[CVPR 2026] Instance-level Visual Active Tracking with Occlusion-Aware Planning
GPT Image 2 prompt gallery, image prompt library, agentic skill, and CLI for OpenAI image generation/editing
A cross-platform desktop All-in-One assistant tool for Claude Code, Codex, OpenCode, openclaw & Gemini CLI.
谷歌新书Agent设计模式(agentic design patterns)最佳中文版,持续优化。附:在线阅读、pdf和epub电子书下载。
Everything about the SmolLM and SmolVLM family of models
OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
A curated list of works related to Misinformation Video Detection, as a companion material for an ACM Multimedia 2023 survey
Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA
JavaScript in-page GUI agent. Control web interfaces with natural language.
"OpenSpace: Make Your Agents: Smarter, Low-Cost, Self-Evolving" -- Community: https://open-space.cloud/
"ClawTeam: Agent Swarm Intelligence" (One Command → Full Automation)
This project presents a multimodal machine learning system for detecting fake news using text, image, and video data.
Official repository for Robust Multimodal Large Language Models Against Modality Conflict
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
[ICLR 2026 Oral] Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning.
AI-Generated Video Detection via Perception Pretext Reinforcement Learning
【AAAI 2026】GenVidBench: A 6-Million Benchmark for AI-Generated Video Detection
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
[ICLR 2025] OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
"CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub: https://clianything.cc/
Twinkle✨: Training workbench to make your model glow.