Abstract
We consider two factors that can dominate per-
formances of fine grain parallel programming on multicore
machines:
•Cache coherency protocols, which preserve cache co-
herency and by this, add large overhead.
•The number of real kernel threads that are used to
execute the possibly large number of program threads
explicitly generated by the parallel constructs of the
program.
As for the first factor we designed a cache aware scheduling
scheme which, based on memory profile, schedules threads
such that cache misses are minimized. As for the second
factor, we implemented a lazy thread system, which replaces
threads with loop iterations and function calls- minimiz-
ing the number of real threads spawned throughout the
execution. These two techniques may be conflicting each
other since by reducing the number of real threads that
are generated we reduce the freedom degree of the cache
aware scheduler to minimize cache misses. We consider
ParC a programming language that is similar to OpenMP
but supports a more generalized scoping rules than OpenMP,
and designed a lazy thread system for it, enhanced with
cache aware scheduling. Our results prove that cache aware
scheduling can be effective even with very aggressive lazy
thread optimizations. The implementation of the scheduling
system is optimized for the MESI cache coherency protocol.
formances of fine grain parallel programming on multicore
machines:
•Cache coherency protocols, which preserve cache co-
herency and by this, add large overhead.
•The number of real kernel threads that are used to
execute the possibly large number of program threads
explicitly generated by the parallel constructs of the
program.
As for the first factor we designed a cache aware scheduling
scheme which, based on memory profile, schedules threads
such that cache misses are minimized. As for the second
factor, we implemented a lazy thread system, which replaces
threads with loop iterations and function calls- minimiz-
ing the number of real threads spawned throughout the
execution. These two techniques may be conflicting each
other since by reducing the number of real threads that
are generated we reduce the freedom degree of the cache
aware scheduler to minimize cache misses. We consider
ParC a programming language that is similar to OpenMP
but supports a more generalized scoping rules than OpenMP,
and designed a lazy thread system for it, enhanced with
cache aware scheduling. Our results prove that cache aware
scheduling can be effective even with very aggressive lazy
thread optimizations. The implementation of the scheduling
system is optimized for the MESI cache coherency protocol.
Original language | English |
---|---|
State | Published - 2012 |
Event | The 2012 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'12) - Duration: 1 Jan 2012 → … |
Conference
Conference | The 2012 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'12) |
---|---|
Period | 1/01/12 → … |