Parallel workflow

Communication-Avoiding Linear Algebraic Kernel K-Means on GPUs

Clustering is an important tool in data analysis, with K-means being popular for its simplicity and versatility. However, it cannot handle non-linearly separable clusters. Kernel K-means addresses this limitation but requires a large kernel matrix, making it computationally and memory intensive. Prior work has accelerated Kernel K-means by formulating it using sparse linear algebra primitives and implementing it on a single GPU. However, that approach cannot run on datasets with more than approximately 80,000 samples due to limited GPU memory. In this work, we address this issue by presenting a suite of distributed-memory parallel algorithms for large-scale Kernel K-means clustering on multi-GPU systems.

January 2026 · Julian Bellavita, Matthew Rubino, Nakul Iyer, Andrew Chang, Aditya Devarakonda, Flavio Vella, Giulia Guidi
Communication-avoiding s-step dual coordinate descent

Scalable Dual Coordinate Descent for Kernel Methods

We develop scalable dual coordinate descent (DCD) and block dual coordinate descent (BDCD) methods for kernel support vector machines and kernel ridge regression. We derive s-step variants that reduce communication frequency by a tunable factor of s while computing the same solution in exact arithmetic, achieving strong scaling speedups of up to 9.8x over existing methods on up to 512 cores. This paper received the Outstanding Paper Award at HPC Asia 2025.

January 2025 · Zishan Shao, Aditya Devarakonda
2D parallel SGD communication trade-off

Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization

This work generalizes 1D s-step SGD and 1D Federated SGD with Averaging (FedAvg) to yield a 2D parallel SGD method (HybridSGD) that attains a continuous performance trade-off between the two baseline algorithms. We present theoretical analysis of the convergence, computation, communication, and memory trade-offs, and a C++/MPI implementation that achieves speedups of up to 5.3x over s-step SGD and up to 121x over FedAvg on a Cray EX system.

January 2025 · Aditya Devarakonda, Ramakrishnan Kannan