TY - GEN
T1 - An exact algorithm for energy-efficient acceleration of task trees on CPU/GPU architectures
AU - Silberstein, Mark
AU - Maruyama, Naoya
PY - 2011
Y1 - 2011
N2 - We consider the problem of energy-efficient acceleration of applications comprising multiple interdependent tasks forming a dependency tree, on a hypothetical CPU/GPU system where both a CPU and a GPU can be powered off when idle. Each task in the tree can be invoked on either a GPU or a CPU, but the performance may vary: some run faster on a GPU, while others prefer a CPU, making the choice of the lowest-energy processor input dependent. Furthermore, greedily minimizing the energy consumption for each task is suboptimal because of the additional energy required for the communication between the tasks executed on different processors. We propose an efficient algorithm that takes into account the energy consumption of a CPU and a GPU for each task, as well as the communication costs of data transfers between them, and constructs an optimal acceleration schedule with provably minimal total consumed energy. We evaluate the algorithm in the context of a real application having a task dependency tree structure and show up to 2.5-fold improvement in the expected energy consumption over the best single processor schedule, and up to 50% improvement over the communication unaware schedule on real inputs. We also show how this algorithm can be used to speedup computations rather than minimize power consumption. We achieve achieve up to a 2-fold speedup in real CPU/GPU systems.
AB - We consider the problem of energy-efficient acceleration of applications comprising multiple interdependent tasks forming a dependency tree, on a hypothetical CPU/GPU system where both a CPU and a GPU can be powered off when idle. Each task in the tree can be invoked on either a GPU or a CPU, but the performance may vary: some run faster on a GPU, while others prefer a CPU, making the choice of the lowest-energy processor input dependent. Furthermore, greedily minimizing the energy consumption for each task is suboptimal because of the additional energy required for the communication between the tasks executed on different processors. We propose an efficient algorithm that takes into account the energy consumption of a CPU and a GPU for each task, as well as the communication costs of data transfers between them, and constructs an optimal acceleration schedule with provably minimal total consumed energy. We evaluate the algorithm in the context of a real application having a task dependency tree structure and show up to 2.5-fold improvement in the expected energy consumption over the best single processor schedule, and up to 50% improvement over the communication unaware schedule on real inputs. We also show how this algorithm can be used to speedup computations rather than minimize power consumption. We achieve achieve up to a 2-fold speedup in real CPU/GPU systems.
UR - http://www.scopus.com/inward/record.url?scp=79960091989&partnerID=8YFLogxK
U2 - 10.1145/1987816.1987826
DO - 10.1145/1987816.1987826
M3 - منشور من مؤتمر
SN - 9781450307734
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 4th Annual International Systems and Storage Conference, SYSTOR 2011
T2 - 4th Annual International Systems and Storage Conference, SYSTOR 2011
Y2 - 30 May 2011 through 1 June 2011
ER -