Stars
SoulX-FlashTalk is the first 14B model to achieve sub-second start-up latency (0.87s) while maintaining a real-time throughput of 32 FPS on an 8xH800 node.
🎦 Micam 是一个专为小米摄像头设计的 RTSP 桥接服务(非官方),能够将小米摄像头的视频流本地转推到RTSP服务器,支持接入 HomeAssistant、Go2rtc、Frigate、Scrypted、Homekit 等多种NVR和智能家居系统。该项目采用 Docker Compose 快速部署方案,基于小米官方的Miloco,并集成Go2rtc实现RTSP流服务,无需GPU即可运行…
A Fully Self-Hosted Solution for Full-Duplex Voice Interaction
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
[ICCV 2025] Official Pytorch Implementation of FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait.
OpenAI Agents adapter for Livekit
A tool for Container Debloating that removes bloat and improves performance.
LiveKit Agent integrated with MCP server of Home Assistant
Turns any OpenAI voice agent into a lively visual agent with bitHuman SDK
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
A debugging and profiling tool that can trace and visualize python code execution
coredumpy saves your crash site for post-mortem debugging
A lightweight, powerful framework for multi-agent workflows
Voice activity detector (VAD) for the browser with a simple API
Agno turns agents into production software. Build agents in any framework. Run as a service. Ship to real users.
[CVPR2025] We present StableAnimator, the first end-to-end ID-preserving video diffusion framework, which synthesizes high-quality videos without any post-processing, conditioned on a reference ima…
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
A framework for building realtime voice AI agents 🤖🎙️📹
Human: AI-powered 3D Face Detection & Rotation Tracking, Face Description & Recognition, Body Pose Tracking, 3D Hand & Finger Tracking, Iris Analysis, Age & Gender & Emotion Prediction, Gaze Tracki…
Playground Web UI using segment-anything-2 models from the Meta.
Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"