1. 1. papers
    ❱
    1. 1.1. pollux
    2. 1.2. adasum
    3. 1.3. adaptation_learning
    4. 1.4. gradient_descent
    5. 1.5. auto_parallel
    6. 1.6. scheduling
    7. 1.7. gradient_compression
      ❱
      1. 1.7.1. dgc
      2. 1.7.2. csc

Papers

1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs

Frank Seide et. al. MRA, Tsinghua, MR. INTERSPEECH 2014

1-BitSGDwithErrorFeedback

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

Yujun Lin et. al. Tsinghua University, ICLR 2018

DGC

GRACE: A Compressed Communication Framework for Distributed Machine Learning

Hang Xu et. al. 2021

GRACE

SIDCo An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems

Ahmed M. Abdelmoniem et. al. CEMSE, KAUST. MLSys 2021

SIDCo