C+MPI and Spark parallel efficiency comparison

Matrix Factorizations at Scale: A Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

This work compares Apache Spark with traditional C and MPI implementations for NMF, PCA, and CX matrix factorizations on particle physics, climate modeling, and bioimaging data. The experiments scale to 1600 Cray XC40 nodes and provide tuning guidance for high-performance scientific data analytics.

December 2016 · Alex Gittens, Aditya Devarakonda, Evan Racah, Michael Ringenburg, Lisa Gerhardt, Jey Kottalam, Jialin Liu, Kristyn Maschhoff, Shane Canon, Jatin Chhugani, Pramod Sharma, Jianlin Yang, James Demmel, Jim Harrell, Vijay Krishnamurthy, Michael W. Mahoney, Prabhat