TY - GEN
T1 - Quasi fat trees for HPC clouds and their fault-resilient closed-form routing
AU - Zahavi, Eitan
AU - Keslassy, Isaac
AU - KOLODNY, AVINOAM
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/10/15
Y1 - 2014/10/15
N2 - High-Performance Computing (HPC) Clusters and Data Center Networks often rely on fat-tree topologies. However, fat trees and their known variants are not designed for concurrent small jobs. As a result, in recent years, HPC designers have introduced ad-hoc topologies to offer better performance for these concurrent small jobs. In this paper, we present and formally define these topologies, which we call Quasi Fat Trees (QFTs). Specifically, we formulate the graph structure of these new topologies, and show that they perform better for concurrent small jobs. Furthermore, we derive a closed-form and fault-resilient contention-free routing algorithm for all global shift permutations. This routing optimizes the run-time of large computing jobs that utilize MPI collectives. Finally, we verify the algorithm by running its implementation as an OpenSM routing engine on various sizes of QFT topologies, and show that it exhibits good performance.
AB - High-Performance Computing (HPC) Clusters and Data Center Networks often rely on fat-tree topologies. However, fat trees and their known variants are not designed for concurrent small jobs. As a result, in recent years, HPC designers have introduced ad-hoc topologies to offer better performance for these concurrent small jobs. In this paper, we present and formally define these topologies, which we call Quasi Fat Trees (QFTs). Specifically, we formulate the graph structure of these new topologies, and show that they perform better for concurrent small jobs. Furthermore, we derive a closed-form and fault-resilient contention-free routing algorithm for all global shift permutations. This routing optimizes the run-time of large computing jobs that utilize MPI collectives. Finally, we verify the algorithm by running its implementation as an OpenSM routing engine on various sizes of QFT topologies, and show that it exhibits good performance.
KW - Fat Tree
KW - HPC
KW - Routing
KW - Topology
UR - http://www.scopus.com/inward/record.url?scp=84918832474&partnerID=8YFLogxK
U2 - 10.1109/HOTI.2014.19
DO - 10.1109/HOTI.2014.19
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84918832474
T3 - Proceedings - 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects, HOTI 2014
SP - 41
EP - 48
BT - Proceedings - 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects, HOTI 2014
T2 - 22nd IEEE Annual Symposium on High-Performance Interconnects, HOTI 2014
Y2 - 26 August 2014 through 28 August 2014
ER -