TY - GEN
T1 - Control flow coalescing on a hybrid dataflow/von Neumann GPGPU
AU - Voitsechov, Dani
AU - Etsion, Yoav
N1 - Publisher Copyright: © 2015 ACM.
PY - 2015/12/5
Y1 - 2015/12/5
N2 - We propose the hybrid dataflow/von Neumann vector graph instruction word (VGIW) architecture. This data-parallel architecture concurrently executes each basic block's dataflow graph (graph instruction word) for a vector of threads, and schedules the different basic blocks based on von Neumann control flow semantics. The VGIW processor dynamically coalesces all threads that need to execute a specific basic block into a thread vector and, when the block is scheduled, executes the entire thread vector concurrently. The proposed control flow coalescing model enables the VGIW architecture to overcome the control flow divergence problem, which greatly impedes the performance and power efficiency of data-parallel architectures. Furthermore, using von Neumann control flow semantics enables the VGIW architecture to overcome the limitations of the recently proposed single-graph multiple-flows (SGMF) dataflow GPGPU, which is greatly constrained in the size of the kernels it can execute. Our evaluation shows that VGIW can achieve an average speedup of 3× (up to 11×) over an NVIDIA GPGPU, while providing an average 1.75× better energy efficiency (up to 7×).
AB - We propose the hybrid dataflow/von Neumann vector graph instruction word (VGIW) architecture. This data-parallel architecture concurrently executes each basic block's dataflow graph (graph instruction word) for a vector of threads, and schedules the different basic blocks based on von Neumann control flow semantics. The VGIW processor dynamically coalesces all threads that need to execute a specific basic block into a thread vector and, when the block is scheduled, executes the entire thread vector concurrently. The proposed control flow coalescing model enables the VGIW architecture to overcome the control flow divergence problem, which greatly impedes the performance and power efficiency of data-parallel architectures. Furthermore, using von Neumann control flow semantics enables the VGIW architecture to overcome the limitations of the recently proposed single-graph multiple-flows (SGMF) dataflow GPGPU, which is greatly constrained in the size of the kernels it can execute. Our evaluation shows that VGIW can achieve an average speedup of 3× (up to 11×) over an NVIDIA GPGPU, while providing an average 1.75× better energy efficiency (up to 7×).
KW - GPGPU
KW - SIMD
KW - dataflow
KW - reconfigurable architectures
UR - http://www.scopus.com/inward/record.url?scp=84959877381&partnerID=8YFLogxK
U2 - 10.1145/2830772.2830817
DO - 10.1145/2830772.2830817
M3 - منشور من مؤتمر
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 216
EP - 227
BT - Proceedings - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015
T2 - 48th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2015
Y2 - 5 December 2015 through 9 December 2015
ER -