Nap: Network-Aware Data Partitions for Efficient Distributed Processing

Or Raz, Chen Avin, Stefan Schmid

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    In order to support emerging data-intensive applications, many clever frameworks have been developed over the last years to efficiently and distributedly process big data sets, such as MapReduce. However, these frameworks are often optimized for relatively homogeneous environments, and accounting, e.g., for the varying connectivity of wide-area network infrastructure, may require complex placement algorithms. In this paper, we present Nap, which allows optimizing distributed data processing frameworks such as MapReduce for heterogeneous environments. Nap allows adapting resources dynamically, without requiring complex placement or migration algorithms, or modifications to the logic of the mappers and reducers. Rather, Nap simply changes the data partition, by spawning virtual nodes (e.g., reducers) depending on the demand. To this end, Nap leverages a connection to integer partition problems and employs Young lattices to guarantee minimal completion times (i.e., the makespan). In fact, Nap comes with provable performance guarantees and also supports applications that leverage redundancy to speed up executions further. In particular, to demonstrate our framework, as a case study, we show how to execute multiway joins across wide-area networks with limited bandwidth efficiently. Our experiments, based on a proof-of-concept prototype implementation, confirm the potential of Nap to reduce completion times.

    Original languageAmerican English
    Title of host publication2019 IEEE 18th International Symposium on Network Computing and Applications, NCA 2019
    EditorsAris Gkoulalas-Divanis, Mirco Marchetti, Dimiter R. Avresky
    ISBN (Electronic)9781728125220
    DOIs
    StatePublished - 1 Sep 2019
    Event18th IEEE International Symposium on Network Computing and Applications, NCA 2019 - Cambridge, United States
    Duration: 26 Sep 201928 Sep 2019

    Publication series

    Name2019 IEEE 18th International Symposium on Network Computing and Applications, NCA 2019

    Conference

    Conference18th IEEE International Symposium on Network Computing and Applications, NCA 2019
    Country/TerritoryUnited States
    CityCambridge
    Period26/09/1928/09/19

    Keywords

    • Distributed systems
    • Young lattices
    • heterogeneity
    • multiway joins
    • networks

    All Science Journal Classification (ASJC) codes

    • Information Systems and Management
    • Computer Networks and Communications
    • Computer Science Applications
    • Safety, Risk, Reliability and Quality

    Fingerprint

    Dive into the research topics of 'Nap: Network-Aware Data Partitions for Efficient Distributed Processing'. Together they form a unique fingerprint.

    Cite this