akshitac8

😑

Busy

Akshita Gupta akshitac8

😑

Busy

ELLIS PhD student @TU-Darmstadt

273 followers · 457 following

TU-Darmstadt
Darmstadt
01:54 (UTC -12:00)
http://akshitac8.github.io/
@akshitac8

Achievements

x2 x2

Achievements

x2 x2

Highlights

Developer Program Member

Organizations

Stars

andrehuang / research-companion

Strategic research thinking agents for Claude Code — idea evaluation, project triage, and structured brainstorming. Helps you decide which papers to write, not just how to write them.

621 55 Updated Apr 13, 2026

lcqysl / FrameThinker

[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"

Python 43 5 Updated Oct 9, 2025

ant-research / UniAD

[CVPR'25] Official implementation for paper - Contextual AD Narration with Interleaved Multimodal Sequence

Python 8 Updated May 1, 2025

LALBJ / PAI

[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs

Python 168 10 Updated Nov 6, 2024

zhang9302002 / ThinkingWithVideos

The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"

Python 92 1 Updated Oct 15, 2025

shikras / shikra

Python 806 48 Updated Jul 8, 2024

NVIDIA / audio-flamingo

PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models

1,103 93 Updated Dec 15, 2025

xmed-lab / TAM

[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs

Python 184 10 Updated Dec 14, 2025

zjysteven / lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

Python 373 43 Updated Feb 28, 2026

VectorSpaceLab / Video-XL

🔥🔥First-ever hour scale video understanding models

Python 622 43 Updated Jul 14, 2025

apple / ml-fastvlm

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 7,327 551 Updated May 5, 2025

test-time-training / ttt-video-dit

Official PyTorch implementation of One-Minute Video Generation with Test-Time Training

Python 2,411 6 Updated Feb 25, 2026

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 19,051 1,739 Updated Jan 30, 2026

Vision-CAIR / LongVU

[ICML 2025] Official PyTorch implementation of LongVU

Python 425 36 Updated May 8, 2025

apple / visatronic-demo

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

HTML 16 3 Updated May 28, 2025

m-bain / CondensedMovies

Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]

Python 197 29 Updated Sep 21, 2022

OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Python 879 75 Updated Aug 27, 2024

0nutation / SpeechGPT

SpeechGPT Series: Speech Large Language Models

Python 1,407 96 Updated Jul 22, 2024

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,139 223 Updated May 19, 2025

lucas-ventura / chapter-llama

Official PyTorch implementation of the paper "Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs"

Python 96 15 Updated Jun 6, 2025

Jyxarthur / AutoAD-Zero

[ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Python 30 1 Updated Jan 28, 2025

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 45,179 6,054 Updated Aug 16, 2024

Jyxarthur / shot-by-shot

[ICCV 2025] Official Implementation of "Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation". Junyu Xie, Tengda Han, Max Bain, Arsha Nagrani, Eshika Khandelwal, Gül Varol, W…

Python 22 2 Updated Jul 26, 2025

chunmeifeng / SPRC

【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval

Python 93 9 Updated Apr 16, 2024

sming256 / OpenTAD

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

Python 327 25 Updated Apr 29, 2025

OpenGVLab / video-mamba-suite

The suite of modeling video with Mamba

Python 293 31 Updated May 14, 2024

lisadunlap / LADS

Official Implementation of LADS (Latent Augmentation using Domain descriptionS)

Python 50 9 Updated Apr 18, 2023

aim-uofa / AdelaiDepth

This repo contains the projects: 'Virtual Normal', 'DiverseDepth', and '3D Scene Shape'. They aim to solve the monocular depth estimation, 3D scene reconstruction from single image problems.

Python 1,106 149 Updated Nov 10, 2023

NVlabs / stylegan3

Official PyTorch implementation of StyleGAN3

Python 6,923 1,238 Updated Sep 12, 2023

snap-research / 3dgp

3D generation on ImageNet [ICLR 2023]

Python 214 8 Updated May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Akshita Gupta akshitac8

Achievements

Achievements

Highlights

Organizations

Block or report akshitac8

Stars

andrehuang / research-companion

lcqysl / FrameThinker

ant-research / UniAD

LALBJ / PAI

zhang9302002 / ThinkingWithVideos

shikras / shikra

NVIDIA / audio-flamingo

xmed-lab / TAM

zjysteven / lmms-finetune

VectorSpaceLab / Video-XL

apple / ml-fastvlm

test-time-training / ttt-video-dit

QwenLM / Qwen3-VL

Vision-CAIR / LongVU

apple / visatronic-demo

m-bain / CondensedMovies

OpenMOSS / AnyGPT

0nutation / SpeechGPT

ictnlp / LLaMA-Omni

lucas-ventura / chapter-llama

Jyxarthur / AutoAD-Zero

coqui-ai / TTS

Jyxarthur / shot-by-shot

chunmeifeng / SPRC

sming256 / OpenTAD

OpenGVLab / video-mamba-suite

lisadunlap / LADS

aim-uofa / AdelaiDepth

NVlabs / stylegan3

snap-research / 3dgp