Systems for AI Lab Georgia Tech

Research

All

2024

Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Amey Agrawal, Junda Chen, Íñigo Goiri, Ramachandran Ramjee, Chaojie Zhang, Alexey Tumanov, Esha Choukse
arXiv  ·  27 Sep 2024  ·  arxiv:2409.17264
SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference
SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference
Alind Khare, Animesh Agrawal, Aditya Annavajjala, Payman Behnam, Myungjin Lee, Hugo Latapie, Alexey Tumanov
18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, Oct 2024  ·  12 Jul 2024  ·  arxiv:2301.10879
Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
Amey Agrawal, Anmol Agarwal, Nitin Kedia, Jayashree Mohan, Souvik Kundu, Nipun Kwatra, Ramachandran Ramjee, Alexey Tumanov
arXiv  ·  10 Jul 2024  ·  arxiv:2407.07000
D{\epsilon}pS: Delayed {\epsilon}-Shrinking for Faster Once-For-All Training
D{\epsilon}pS: Delayed {\epsilon}-Shrinking for Faster Once-For-All Training
Aditya Annavajjala, Alind Khare, Animesh Agrawal, Igor Fedorov, Hugo Latapie, Myungjin Lee, Alexey Tumanov
18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, Oct 2024  ·  09 Jul 2024  ·  arxiv:2407.06167
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, Ramachandran Ramjee
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI’24), Santa Clara  ·  19 Jun 2024  ·  arxiv:2403.02310
Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators
Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators
Payman Behnam, Uday Kamal, A. Shafiee, Alexey Tumanov, Saibal Mukhopadhyay
Proc. 38'th IEEE International Parallel and Distributed Processing Symposium (IPDPS'24)  ·  27 May 2024
Vidur: A Large-Scale Simulation Framework For LLM Inference
Vidur: A Large-Scale Simulation Framework For LLM Inference
Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav Gulavani, Ramachandran Ramjee, Alexey Tumanov
7th Annual Conference on Machine Learning Systems (MLSys’24), Santa Clara  ·  22 May 2024  ·  arxiv:2405.05465

2023

SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads
SuperServe: Fine-Grained Inference Serving for Unpredictable Workloads
Alind Khare, Dhruv Garg, Sukrit Kalra, Snigdha Grandhi, Ion Stoica, Alexey Tumanov
22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI'25), Philadelphia, USA, 2025.  ·  29 Dec 2023  ·  arxiv:2312.16733
TransEHR: Self-Supervised Transformer for Clinical Time Series Data
TransEHR: Self-Supervised Transformer for Clinical Time Series Data
Yanbo Xu, Shangqing Xu, Manav Ramprassad, Alexey Tumanov, Chao Zhang
Proc. of Machine Learning for Health (ML4H'23)  ·  10 Dec 2023
Hardware–Software Co-Design for Real-Time Latency–Accuracy Navigation in Tiny Machine Learning Applications
Hardware–Software Co-Design for Real-Time Latency–Accuracy Navigation in Tiny Machine Learning Applications
Payman Behnam, Jianming Tong, Alind Khare, Yangyu Chen, Yue Pan, Pranav Gadikar, Abhimanyu Bambhaniya, Tushar Krishna, Alexey Tumanov
IEEE Micro  ·  01 Nov 2023  ·  doi:10.1109/MM.2023.3317243
ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
ABKD: Graph Neural Network Compression with Attention-Based Knowledge Distillation
Anshul Ahluwalia, Rohit Das, Payman Behnam, Alind Khare, Pan Li, Alexey Tumanov
arXiv  ·  25 Oct 2023  ·  arxiv:2310.15938
DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov
15th ACM Symposium on Cloud Computing (SoCC 2024), Redmond, USA, Nov 2024  ·  06 Sep 2023  ·  arxiv:2306.11800
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee
arXiv  ·  01 Sep 2023  ·  arxiv:2308.16369
Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems
Debopam Sanyal, Jui-Tse Hung, Manav Agrawal, Prahlad Jasti, Shahab Nikkhoo, Somesh Jha, Tianhao Wang, Sibin Mohan, Alexey Tumanov
arXiv  ·  08 Aug 2023  ·  arxiv:2307.01292
Subgraph Stationary Hardware-Software Inference Co-Design
Subgraph Stationary Hardware-Software Inference Co-Design
Payman Behnam, Jianming Tong, Alind Khare, Yangyu Chen, Yue Pan, Pranav Gadikar, Abhimanyu Rajeshkumar Bambhaniya, Tushar Krishna, Alexey Tumanov
Proc. of Sixth Conference on Machine Learning and Systems (MLSys'23)  ·  03 Jul 2023  ·  arxiv:2306.17266
Signed-Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off
Signed-Binarization: Unlocking Efficiency Through Repetition-Sparsity Trade-Off
Sachit Kuhar, Alexey Tumanov, Judy Hoffman
Proc. of 3rd On-Device Intelligence Workshop, Machine Learning and Systems (MLSys'23)  ·  01 Jun 2023

2022

UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
UnfoldML: Cost-Aware and Uncertainty-Based Dynamic 2D Prediction for Multi-Stage Classification
Yanbo Xu, Alind Khare, Glenn Matlin, Monish Ramadoss, Rishikesan Kamaleswaran, Chao Zhang, Alexey Tumanov
Proc. of 36'th Conference on Neural Information Processing Systems (NeurIPS'22)  ·  31 Oct 2022  ·  arxiv:2210.15056
Enabling Real-time DNN Switching via Weight-Sharing
Enabling Real-time DNN Switching via Weight-Sharing
Jianming Tong, Yangyu Chen, Yue Pan, Abhimanyu Bambhaniya, Alind Khare, Taekyung Heo, Alexey Tumanov, Tushar Krishna
Proc. of 2nd Architecture, Compiler, and System Support for Multi-model DNN Workloads Workshop  ·  01 Jun 2022

2021

CompOFA: Compound Once-For-All Networks for Faster Multi-Platform Deployment
CompOFA: Compound Once-For-All Networks for Faster Multi-Platform Deployment
Manas Sahni, Shreya Varshini, Alind Khare, Alexey Tumanov
Proc. of International Conference on Learning Representations (ICLR'21)  ·  27 Apr 2021  ·  arxiv:2104.12642

2020

HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units
HOLMES: Health OnLine Model Ensemble Serving for Deep Learning Models in Intensive Care Units
Shenda Hong, Yanbo Xu, Alind Khare, Satria Priambada, Kevin Maher, Alaa Aljiffry, Jimeng Sun, Alexey Tumanov
Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining  ·  20 Aug 2020  ·  doi:10.1145/3394486.3403212