Skip to content
View dongyh20's full-sized avatar

Block or report dongyh20

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dongyh20/README.md

Hi there 👋

🔭 I’m currently working on the topic of visual perception and my long-term goal is to build general foundation models.

⚡ Recently I'm focusing on vision-language model and unified visual models.

📫 If you are also interested in relevant issues, feel free to chat with me!

Pinned Loading

  1. Oryx-mllm/Oryx Oryx-mllm/Oryx Public

    [ICLR 2025] MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

    Python 330 19

  2. Insight-V Insight-V Public

    [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

    Python 241 6

  3. MME-Benchmarks/Video-MME-v2 MME-Benchmarks/Video-MME-v2 Public

    Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

    Python 355 1

  4. Ola-Omni/Ola Ola-Omni/Ola Public

    Ola: Pushing the Frontiers of Omni-Modal Language Model

    Python 389 16

  5. Octopus Octopus Public

    [ECCV2024] 🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.

    Python 298 20

  6. open-compass/VLMEvalKit open-compass/VLMEvalKit Public

    Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

    Python 4.1k 684