1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs
Frank Seide et. al. MRA, Tsinghua, MR. INTERSPEECH 2014
1-BitSGDwithErrorFeedback
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Yujun Lin et. al. Tsinghua University, ICLR 2018
GRACE: A Compressed Communication Framework for Distributed Machine Learning
Hang Xu et. al. 2021
SIDCo An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems
Ahmed M. Abdelmoniem et. al. CEMSE, KAUST. MLSys 2021