TY - GEN
T1 - Code alignment for architectures with pipeline group dispatching
AU - Boehm, Omer
AU - Haber, Gadi
AU - Kosachevsky, Helena
PY - 2010
Y1 - 2010
N2 - Today's architectures exploit long pipelines in order to increase instruction-level parallelism by grouping sets of consecutive instructions and feeding them into the pipeline with the purpose of executing them in a single cycle. The IBM Power architecture executes programs by dispatching groups of instructions where a dispatch group is fed as a whole into the pipeline to be executed in a single cycle. Such architecture, however, includes many cases of pipeline delays that result from dependencies between the resources of separate groups. As a result, there is a need to optimize the code in order to help the architecture place all the instructions in such a way that will produce as few delays as possible. Optimizing the alignment and the placement of the code is therefore, crucial to the performance of the program in such architectures. We show that in some cases, without proper code alignment, performance can degrade by 40% due to the impact of code alignment on the grouping and pipeline delays. We present a new binary-level and profile-based code alignment algorithm for architectures that make use of group dispatching. We show a steady performance gain of about 2-3% for fully optimized code running on IBM Power 6 while completely eliminating performance instability which can sometimes result in up to 40% variation in performance in the absence of the proposed code alignment algorithm. As the algorithm is based on gathered profiling and applies at binary-level, it can, therefore, be used as part of existing dynamic compilers and enabled on top of the operating system at runtime.
AB - Today's architectures exploit long pipelines in order to increase instruction-level parallelism by grouping sets of consecutive instructions and feeding them into the pipeline with the purpose of executing them in a single cycle. The IBM Power architecture executes programs by dispatching groups of instructions where a dispatch group is fed as a whole into the pipeline to be executed in a single cycle. Such architecture, however, includes many cases of pipeline delays that result from dependencies between the resources of separate groups. As a result, there is a need to optimize the code in order to help the architecture place all the instructions in such a way that will produce as few delays as possible. Optimizing the alignment and the placement of the code is therefore, crucial to the performance of the program in such architectures. We show that in some cases, without proper code alignment, performance can degrade by 40% due to the impact of code alignment on the grouping and pipeline delays. We present a new binary-level and profile-based code alignment algorithm for architectures that make use of group dispatching. We show a steady performance gain of about 2-3% for fully optimized code running on IBM Power 6 while completely eliminating performance instability which can sometimes result in up to 40% variation in performance in the absence of the proposed code alignment algorithm. As the algorithm is based on gathered profiling and applies at binary-level, it can, therefore, be used as part of existing dynamic compilers and enabled on top of the operating system at runtime.
UR - http://www.scopus.com/inward/record.url?scp=77955008198&partnerID=8YFLogxK
U2 - 10.1145/1815695.1815725
DO - 10.1145/1815695.1815725
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:77955008198
SN - 9781605589084
T3 - ACM International Conference Proceeding Series
BT - Proceedings of SYSTOR 2010 - The 3rd Annual Haifa Experimental Systems Conference
T2 - 3rd Annual Haifa Experimental Systems Conference, SYSTOR 2010
Y2 - 24 May 2010 through 26 May 2010
ER -