Skip to content
View kongty's full-sized avatar
  • Stanford University

Block or report kongty

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Production-grade Rust-native trading engine with deterministic event-driven architecture

Rust 22,237 2,701 Updated Apr 24, 2026

CGRA-Flow is an integrated framework for CGRA compilation, exploration, synthesis, and development.

Python 157 24 Updated Feb 18, 2026

NeuroSpector: Dataflow and Mapping Optimizer for Deep Neural Network Accelerators

C++ 21 3 Updated Mar 20, 2025

A pytorch version of frustum-pointnets

Python 131 30 Updated Mar 18, 2020

HW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators

C++ 189 59 Updated Jan 23, 2026

A Python library for large-scale nearest neigbhor computations via k-d trees and GPUs.

C 64 19 Updated Jun 21, 2022

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 24,223 3,218 Updated Aug 15, 2024

Benchmark suite for embedded autonomous vehicle application

C++ 17 8 Updated Dec 28, 2022

An end-to-end benchmark suite of multi-modal DNN applications for system-architecture co-design

Python 22 9 Updated Dec 13, 2024

Convert pointpillars Pytorch Model To ONNX for TensorRT Inference

Python 408 84 Updated Nov 11, 2020

Offical PyTorch implementation of "BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework"

Python 962 126 Updated Apr 5, 2023

Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.

C++ 478 131 Updated Feb 19, 2026

An analytical framework that models hardware dataflow of tensor applications on spatial architectures using the relation-centric notation.

C++ 89 13 Updated Apr 28, 2024
C 15 5 Updated Mar 17, 2026

Mnemosyne: Multi-Bank Memories for Heterogeneous Architectures

C++ 6 1 Updated Jun 25, 2021

🚀 A very efficient Texas Holdem GTO solver ♠️♥️♣️♦️

C++ 2,387 415 Updated Mar 31, 2026

This is the implementation of the paper [Optimus: Towards Optimal Layer-Fusion on Deep Learning Processors].

Python 9 6 Updated May 10, 2021
C++ 1,245 520 Updated Jan 19, 2026

Prototype-network-on-chip (ProNoC) is an EDA tool that facilitates prototyping of custom heterogeneous NoC-based many-core-SoC (MCSoC).

Verilog 63 23 Updated Dec 15, 2025
SystemVerilog 216 70 Updated Mar 30, 2026

RaveNoC is a configurable HDL NoC (Network-On-Chip) suitable for MPSoCs and different MP applications

SystemVerilog 191 39 Updated Nov 18, 2024

Hierarchical Deep Stereo Matching on High Resolution Images, CVPR 2019.

Python 425 77 Updated Jul 21, 2023

Memory Enhanced Global-Local Aggregation for Video Object Detection, CVPR2020

Python 576 121 Updated May 13, 2021

RTL implementation of Flex-DPE.

Verilog 116 33 Updated Feb 22, 2020

Repository to host and maintain SCALE-Sim code

Python 452 148 Updated Feb 2, 2026

Implementation of a Tensor Processing Unit for embedded systems and the IoT.

VHDL 558 74 Updated Jan 5, 2019

A open source reimplementation of Google's Tensor Processing Unit (TPU).

Python 747 94 Updated Dec 6, 2017

Classical equations and diagrams in machine learning

TeX 8,010 1,338 Updated Jul 30, 2024
Next