TY - JOUR AU - Panda, Dhabaleswar K. AB - S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters Ammar Ahmad Awan Dept. of Computer Science and Engg. The Ohio State University awan.10@osu.edu Khaled Hamidouche Dept. of Computer Science and Engg. The Ohio State University hamidouc@cse.ohio-state.edu Jahanzeb Maqbool Hashmi Dept. of Computer Science and Engg. The Ohio State University hashmi.29@osu.edu Dhabaleswar K. Panda Dept. of Computer Science and Engg. The Ohio State University panda@cse.ohio-state.edu Abstract Availability of large data sets like ImageNet and massively parallel computation support in modern HPC devices like NVIDIA GPUs have fueled a renewed interest in Deep Learning (DL) algorithms. This has triggered the development of DL frameworks like Caffe, Torch, TensorFlow, and CNTK. However, most DL frameworks have been limited to a single node. In order to scale out DL frameworks and bring HPC capabilities to the DL arena, we propose, S-Caffe; a scalable and distributed Caffe adaptation for modern multiGPU clusters. With an in-depth analysis of new requirements brought forward by the DL frameworks and limitations of current communication runtimes, we present a co-design of the Caffe framework and the MVAPICH2-GDR MPI runtime. Using the co-design methodology, we modify Caffe's workflow to maximize the overlap of computation TI - S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters DA - 2017-01-26 UR - https://www.deepdyve.com/lp/association-for-computing-machinery/s-caffe-co-designing-mpi-runtimes-and-caffe-for-scalable-deep-learning-4fpSpS7SF0 DP - DeepDyve ER -