LSQ: Load Balancing in Large-Scale Heterogeneous Systems With Multiple Dispatchers: Load balancing in large-scale heterogeneous systems with multiple dispatchers

Shay Vargaftik, Isaac Keslassy, Ariel Orda

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

Nowadays, the efficiency and even the feasibility of traditional load-balancing policies are challenged by the rapid growth of cloud infrastructure and the increasing levels of server heterogeneity. In such heterogeneous systems with many load-balancers, traditional solutions, such as JSQ , incur a prohibitively large communication overhead and detrimental incast effects due to herd behavior. Alternative low-communication policies, such as JSQ(d) and the recently proposed JIQ , are either unstable or provide poor performance. We introduce the Local Shortest Queue ( LSQ) family of load balancing algorithms. In these algorithms, each dispatcher maintains its own, local, and possibly outdated view of the server queue lengths, and keeps using JSQ on its local view. A small communication overhead is used infrequently to update this local view. We formally prove that as long as the error in these local estimates of the server queue lengths is bounded in expectation, the entire system is strongly stable. Finally, in simulations, we show how simple and stable LSQ policies exhibit appealing performance and significantly outperform existing low-communication policies, while using an equivalent communication budget. In particular, our simple policies often outperform even JSQ due to their reduction of herd behavior. We further show how, by relying on smart servers (i.e., advanced pull-based communication), we can further improve performance and lower communication overhead.

Original languageEnglish
Article number9051660
Pages (from-to)1186-1198
Number of pages13
JournalIEEE/ACM Transactions on Networking
Volume28
Issue number3
DOIs
StatePublished - Jun 2020

Keywords

  • Local shortest queue
  • heterogeneous systems
  • load balancing
  • multiple dispatchers

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'LSQ: Load Balancing in Large-Scale Heterogeneous Systems With Multiple Dispatchers: Load balancing in large-scale heterogeneous systems with multiple dispatchers'. Together they form a unique fingerprint.

Cite this