Applying Software-Managed Caching and CPU/GPU Task Scheduling for Accelerating Dynamic Workloads

Mark Silberstein, Assaf Schuster, John D. Owens

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

This chapter covers two difficult problems frequently encountered by graphics processing unit (GPU) developers-optimizing memory access for kernels with complex input-dependent access patterns, and mapping the computations to a GPU or a CPU in composite applications with multiple dependent kernels. Both pose a formidable challenge, as they require dynamic adaptation and tuning of execution policies to allow high performance for a wide range of inputs. Not meeting these requirements leads to substantial performance penalty. This chapter describes the methodology for solving the memory optimization problem via softwaremanaged caching by efficiently exploiting the fast scratchpad memory. This technique outperforms the cache-less and the texture memory-based approaches on pre-Fermi GPU architectures as well as on the one that uses the Fermi hardware cache alone. It then presents the algorithm for minimizing the total running time of a complete application comprising multiple interdependent kernels. Both a GPU and a CPU can be used to execute the kernels, but the performance varies greatly for different inputs, calling for dynamic assignment of the computations to a GPU or a CPU at runtime. The communication overhead due to the data dependencies between the kernels makes per-kernel greedy selection of the best performing device suboptimal. The algorithm optimizes the runtime of the complete application by evaluating the performance of all the assignments jointly, including the overhead of the data transfers between the devices.

Original languageEnglish
Title of host publicationGPU Computing Gems Jade Edition
Pages501-517
Number of pages17
DOIs
StatePublished - 2012

All Science Journal Classification (ASJC) codes

  • General Computer Science

Fingerprint

Dive into the research topics of 'Applying Software-Managed Caching and CPU/GPU Task Scheduling for Accelerating Dynamic Workloads'. Together they form a unique fingerprint.

Cite this