TY - GEN
T1 - Distributed adaptive routing for big-data applications running on data center networks
AU - Zahavi, Eitan
AU - Keslassy, Isaac
AU - KOLODNY, AVINOAM
PY - 2012
Y1 - 2012
N2 - With the growing popularity of big-data applications, Data Center Networks increasingly carry larger and longer traffic flows. As a result of this increased flow granularity, static routing cannot efficiently load-balance traffic, resulting in an increased network contention and a reduced throughput. Unfortunately, while adaptive routing can solve this load-balancing problem, network designers refrain from using it, because it also creates out-of-order packet delivery that can significantly degrade the reliable transport performance of the longer flows. In this paper, we show that by throttling each flow bandwidth to half of the network link capacity, a distributed-adaptive- routing algorithm is able to converge to a non-blocking routing assignment within a few iterations, causing minimal out-of-order packet delivery. We present a Markov chain model for distributed-adaptive-routing in the context of Clos networks that provides an approximation for the expected convergence time. This model predicts that for full-link-bandwidth traffic, the convergence time is exponential with the network size, so out-of-order packet delivery is unavoidable for long messages. However, with half-rate traffic, the algorithm converges within a few iterations and exhibits weak dependency on the network size. Therefore, we show that distributed-adaptive-routing may be used to provide a scalable and non-blocking routing even for long flows on a rearrangeably-non-blocking Clos network under half-rate conditions. The proposed model is evaluated and approximately fits the abstract system simulation model. Hardware implementation guidelines are provided and evaluated using a detailed flit-level InfiniBand simulation model. These results directly apply to adaptive-routing systems designed and deployed in various fields.
AB - With the growing popularity of big-data applications, Data Center Networks increasingly carry larger and longer traffic flows. As a result of this increased flow granularity, static routing cannot efficiently load-balance traffic, resulting in an increased network contention and a reduced throughput. Unfortunately, while adaptive routing can solve this load-balancing problem, network designers refrain from using it, because it also creates out-of-order packet delivery that can significantly degrade the reliable transport performance of the longer flows. In this paper, we show that by throttling each flow bandwidth to half of the network link capacity, a distributed-adaptive- routing algorithm is able to converge to a non-blocking routing assignment within a few iterations, causing minimal out-of-order packet delivery. We present a Markov chain model for distributed-adaptive-routing in the context of Clos networks that provides an approximation for the expected convergence time. This model predicts that for full-link-bandwidth traffic, the convergence time is exponential with the network size, so out-of-order packet delivery is unavoidable for long messages. However, with half-rate traffic, the algorithm converges within a few iterations and exhibits weak dependency on the network size. Therefore, we show that distributed-adaptive-routing may be used to provide a scalable and non-blocking routing even for long flows on a rearrangeably-non-blocking Clos network under half-rate conditions. The proposed model is evaluated and approximately fits the abstract system simulation model. Hardware implementation guidelines are provided and evaluated using a detailed flit-level InfiniBand simulation model. These results directly apply to adaptive-routing systems designed and deployed in various fields.
KW - adaptive routing
KW - big-data
KW - data center networks
UR - http://www.scopus.com/inward/record.url?scp=84871344241&partnerID=8YFLogxK
U2 - 10.1145/2396556.2396578
DO - 10.1145/2396556.2396578
M3 - منشور من مؤتمر
SN - 9781450316859
T3 - ANCS 2012 - Proceedings of the 8th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
SP - 99
EP - 110
BT - ANCS 2012 - Proceedings of the 8th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
T2 - 8th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS 2012
Y2 - 29 October 2012 through 30 October 2012
ER -