Mixed-precision CA-SGD outer iteration with per-kernel precision slots

Mixed-Precision Communication-Avoiding SGD for Generalized Linear Models on GPUs

Distributed SGD is limited by communication rather than computation, since each iteration requires an AllReduce across processes. We study mixed-precision communication-avoiding SGD (CA-SGD) for generalized linear models on NVIDIA GPUs, decomposing the local rounding error of one CA-SGD outer iteration into nine independent precision choices that depend on the hardware only through its low-precision unit roundoffs. On NERSC Perlmutter A100 GPUs, mixed-precision CA-SGD matches FP32 SGD loss within 0.5% and reaches 5.1-6.8x speedup over FP32 SGD on the epsilon, SUSY, HIGGS, synth, and Poisson-synth datasets.

June 2026 · Aditya Devarakonda, Irene Simó Muñoz, Giulia Guidi
GLMnet accuracy

Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear Models

We present a novel enhanced cyclic coordinate descent (ECCD) framework for solving generalized linear models with elastic net constraints that reduces training time in comparison to existing state-of-the-art methods. We redesign the CD method by performing a Taylor expansion around the current iterate to avoid nonlinear operations arising in the gradient computation. By introducing this approximation we are able to unroll the vector recurrences occurring in the CD method and reformulate the resulting computations into more efficient batched computations.

October 2025 · Yixiao Wang, Zishan Shao, Ting Jiang, Aditya Devarakonda