Skip to content
View akshitac8's full-sized avatar
😑
Busy
😑
Busy

Block or report akshitac8

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Strategic research thinking agents for Claude Code — idea evaluation, project triage, and structured brainstorming. Helps you decide which papers to write, not just how to write them.

621 55 Updated Apr 13, 2026

[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"

Python 43 5 Updated Oct 9, 2025

[CVPR'25] Official implementation for paper - Contextual AD Narration with Interleaved Multimodal Sequence

Python 8 Updated May 1, 2025

[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs

Python 168 10 Updated Nov 6, 2024

The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"

Python 92 1 Updated Oct 15, 2025
Python 806 48 Updated Jul 8, 2024

PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models

1,103 93 Updated Dec 15, 2025

[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs

Python 184 10 Updated Dec 14, 2025

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Python 373 43 Updated Feb 28, 2026

🔥🔥First-ever hour scale video understanding models

Python 622 43 Updated Jul 14, 2025

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 7,327 551 Updated May 5, 2025

Official PyTorch implementation of One-Minute Video Generation with Test-Time Training

Python 2,411 6 Updated Feb 25, 2026

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 19,051 1,739 Updated Jan 30, 2026

[ICML 2025] Official PyTorch implementation of LongVU

Python 425 36 Updated May 8, 2025

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

HTML 16 3 Updated May 28, 2025

Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]

Python 197 29 Updated Sep 21, 2022

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 879 75 Updated Aug 27, 2024

SpeechGPT Series: Speech Large Language Models

Python 1,407 96 Updated Jul 22, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,139 223 Updated May 19, 2025

Official PyTorch implementation of the paper "Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs"

Python 96 15 Updated Jun 6, 2025

[ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Python 30 1 Updated Jan 28, 2025

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 45,179 6,054 Updated Aug 16, 2024

[ICCV 2025] Official Implementation of "Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation". Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Eshika Khandelwal, Gül Varol, W…

Python 22 2 Updated Jul 26, 2025

【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval

Python 93 9 Updated Apr 16, 2024

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Python 327 25 Updated Apr 29, 2025

The suite of modeling video with Mamba

Python 293 31 Updated May 14, 2024

Official Implementation of LADS (Latent Augmentation using Domain descriptionS)

Python 50 9 Updated Apr 18, 2023

This repo contains the projects: 'Virtual Normal', 'DiverseDepth', and '3D Scene Shape'. They aim to solve the monocular depth estimation, 3D scene reconstruction from single image problems.

Python 1,106 149 Updated Nov 10, 2023

Official PyTorch implementation of StyleGAN3

Python 6,923 1,238 Updated Sep 12, 2023

3D generation on ImageNet [ICLR 2023]

Python 214 8 Updated May 23, 2023
Next