I build production-ready computer vision and multimodal systems — from research ideas to real-world pipelines.
Currently at 2GIS, working on visual AI solutions. Previously at MTS (video analytics & moderation) and Skoltech (applied ML research).
- 🎥 Video understanding: scene detection, temporal segmentation, content moderation
- 🧠 Multimodal models: CLIP, BLIP, retrieval & captioning systems
- 🔍 Detection & segmentation: YOLO, Mask R-CNN, GroundingDINO, OWLv2
- ⚙️ ML system design: from offline pipelines to real-time inference
- 📦 Production ML: optimization (ONNX, TensorRT), batching, GPU pipelines
Python · PyTorch · OpenCV · MMDetection · Docker · Kubernetes · Kafka · MLflow
- Built modular video analytics pipeline (scene detection → tracking → captioning → clustering)
- Designed CLIP-based scene grouping with temporal constraints
- Developed content moderation tools (alcohol, smoking, etc.) using multimodal models
- Worked on super-resolution and temporal action segmentation