Parallel workflow

Communication-Avoiding Linear Algebraic Kernel K-Means on GPUs

Clustering is an important tool in data analysis, with K-means being popular for its simplicity and versatility. However, it cannot handle non-linearly separable clusters. Kernel K-means addresses this limitation but requires a large kernel matrix, making it computationally and memory intensive. Prior work has accelerated Kernel K-means by formulating it using sparse linear algebra primitives and implementing it on a single GPU. However, that approach cannot run on datasets with more than approximately 80,000 samples due to limited GPU memory. In this work, we address this issue by presenting a suite of distributed-memory parallel algorithms for large-scale Kernel K-means clustering on multi-GPU systems.

January 2026 · Julian Bellavita, Matthew Rubino, Nakul Iyer, Andrew Chang, Aditya Devarakonda, Flavio Vella, Giulia Guidi
DistShap distributed Shapley value explanation pipeline

DistShap: Scalable GNN Explanations with Distributed Shapley Values

We propose DistShap, a parallel algorithm that distributes Shapley value-based explanations of graph neural network predictions across multiple GPUs. DistShap samples subgraphs in a distributed setting, executes GNN inference in parallel across GPUs, and solves a distributed least squares problem to compute edge importance scores, scaling to GNN models with millions of features on up to 128 GPUs.

June 2025 · Selahattin Akkas, Aditya Devarakonda, Ariful Azad