Systems for AI Lab Georgia Tech

Research

All

2024

Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators
Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators
Payman Behnam, Uday Kamal, A. Shafiee, Alexey Tumanov, Saibal Mukhopadhyay
Proc. 38'th IEEE International Parallel and Distributed Processing Symposium (IPDPS'24)  ·  27 May 2024
Vidur: A Large Scale Simulation Framework For LLM Inference
Vidur: A Large Scale Simulation Framework For LLM Inference
Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee, Alexey Tumanov
7th Annual Conference on Machine Learning Systems (MLSys’24), Santa Clara  ·  13 May 2024
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee
arXiv  ·  05 Mar 2024  ·  arxiv:2403.02310

2023

SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads
SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads
Alind Khare, Dhruv Garg, Sukrit Kalra, Snigdha Grandhi, Ion Stoica, Alexey Tumanov
arXiv  ·  29 Dec 2023  ·  arxiv:2312.16733
TransEHR: Self-Supervised Transformer for Clinical Time Series Data
TransEHR: Self-Supervised Transformer for Clinical Time Series Data
Yanbo Xu, Shangqing Xu, Manav Ramprassad, Alexey Tumanov, Chao Zhang
Proc. of Machine Learning for Health (ML4H'23)  ·  10 Dec 2023
Hardware–Software Co-Design for Real-Time Latency–Accuracy Navigation in Tiny Machine Learning Applications
Hardware–Software Co-Design for Real-Time Latency–Accuracy Navigation in Tiny Machine Learning Applications
Payman Behnam, Jianming Tong, Alind Khare, Yangyu Chen, Yue Pan, Pranav Gadikar, Abhimanyu Bambhaniya, Tushar Krishna, Alexey Tumanov
IEEE Micro  ·  01 Nov 2023  ·  doi:10.1109/MM.2023.3317243
ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Anshul Ahluwalia, Rohit Das, Payman Behnam, Alind Khare, Pan Li, Alexey Tumanov
arXiv  ·  25 Oct 2023  ·  arxiv:2310.15938
DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov
arXiv  ·  06 Sep 2023  ·  arxiv:2306.11800
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee
arXiv  ·  01 Sep 2023  ·  arxiv:2308.16369
Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Debopam Sanyal, Jui-Tse Hung, Manav Agrawal, Prahlad Jasti, Shahab Nikkhoo, Somesh Jha, Tianhao Wang, Sibin Mohan, Alexey Tumanov
arXiv  ·  08 Aug 2023  ·  arxiv:2307.01292
Subgraph Stationary Hardware-Software Inference Co-Design
Subgraph Stationary Hardware-Software Inference Co-Design
Payman Behnam, Jianming Tong, Alind Khare, Yangyu Chen, Yue Pan, Pranav Gadikar, Abhimanyu Rajeshkumar Bambhaniya, Tushar Krishna, Alexey Tumanov
Proc. of Sixth Conference on Machine Learning and Systems (MLSys'23)  ·  03 Jul 2023  ·  arxiv:2306.17266
Signed-Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off
Signed-Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off
Sachit Kuhar, Alexey Tumanov, Judy Hoffman
Proc. of 3rd On-Device Intelligence Workshop, Machine Learning and Systems (MLSys'23)  ·  01 Jun 2023
SuperFed: Weight Shared Federated Learning
SuperFed: Weight Shared Federated Learning
Alind Khare, Animesh Agrawal, Myungjin Lee, Alexey Tumanov
arXiv  ·  27 Jan 2023  ·  arxiv:2301.10879

2022

UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
Yanbo Xu, Alind Khare, Glenn Matlin, Monish Ramadoss, Rishikesan Kamaleswaran, Chao Zhang, Alexey Tumanov
Proc. of 36'th Conference on Neural Information Processing Systems (NeurIPS'22)  ·  31 Oct 2022  ·  arxiv:2210.15056
Enabling Real-time DNN Switching via Weight-Sharing
Enabling Real-time DNN Switching via Weight-Sharing
Jianming Tong, Yangyu Chen, Yue Pan, Abhimanyu Bambhaniya, Alind Khare, Taekyung Heo, Alexey Tumanov, Tushar Krishna
Proc. of 2nd Architecture, Compiler, and System Support for Multi-model DNN Workloads Workshop  ·  01 Jun 2022

2021

CompOFA: Compound Once-For-All Networks for Faster Multi-Platform Deployment
CompOFA: Compound Once-For-All Networks for Faster Multi-Platform Deployment
Manas Sahni, Shreya Varshini, Alind Khare, Alexey Tumanov
Proc. of International Conference on Learning Representations (ICLR'21)  ·  27 Apr 2021  ·  arxiv:2104.12642

2020

HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units
HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units
Shenda Hong, Yanbo Xu, Alind Khare, Satria Priambada, Kevin Maher, Alaa Aljiffry, Jimeng Sun, Alexey Tumanov
Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining  ·  20 Aug 2020  ·  doi:10.1145/3394486.3403212