2D parallel SGD communication trade-off

Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization

This work generalizes 1D s-step SGD and 1D Federated SGD with Averaging (FedAvg) to yield a 2D parallel SGD method (HybridSGD) that attains a continuous performance trade-off between the two baseline algorithms. We present theoretical analysis of the convergence, computation, communication, and memory trade-offs, and a C++/MPI implementation that achieves speedups of up to 5.3x over s-step SGD and up to 121x over FedAvg on a Cray EX system.

January 2025 · Aditya Devarakonda, Ramakrishnan Kannan