Code alignment for architectures with pipeline group dispatching

Omer Boehm; Gadi Haber; Helena Kosachevsky

doi:10.1145/1815695.1815725

Code alignment for architectures with pipeline group dispatching

Omer Boehm, Gadi Haber, Helena Kosachevsky

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Abstract

Today's architectures exploit long pipelines in order to increase instruction-level parallelism by grouping sets of consecutive instructions and feeding them into the pipeline with the purpose of executing them in a single cycle. The IBM Power architecture executes programs by dispatching groups of instructions where a dispatch group is fed as a whole into the pipeline to be executed in a single cycle. Such architecture, however, includes many cases of pipeline delays that result from dependencies between the resources of separate groups. As a result, there is a need to optimize the code in order to help the architecture place all the instructions in such a way that will produce as few delays as possible. Optimizing the alignment and the placement of the code is therefore, crucial to the performance of the program in such architectures. We show that in some cases, without proper code alignment, performance can degrade by 40% due to the impact of code alignment on the grouping and pipeline delays. We present a new binary-level and profile-based code alignment algorithm for architectures that make use of group dispatching. We show a steady performance gain of about 2-3% for fully optimized code running on IBM Power 6 while completely eliminating performance instability which can sometimes result in up to 40% variation in performance in the absence of the proposed code alignment algorithm. As the algorithm is based on gathered profiling and applies at binary-level, it can, therefore, be used as part of existing dynamic compilers and enabled on top of the operating system at runtime.

Original language	English
Title of host publication	Proceedings of SYSTOR 2010 - The 3rd Annual Haifa Experimental Systems Conference
DOIs	https://doi.org/10.1145/1815695.1815725
State	Published - 2010
Externally published	Yes
Event	3rd Annual Haifa Experimental Systems Conference, SYSTOR 2010 - Haifa, Israel Duration: 24 May 2010 → 26 May 2010

Publication series

Name	ACM International Conference Proceeding Series

Conference

Conference	3rd Annual Haifa Experimental Systems Conference, SYSTOR 2010
Country/Territory	Israel
City	Haifa
Period	24/05/10 → 26/05/10

ASJC Scopus subject areas

Software
Human-Computer Interaction
Computer Vision and Pattern Recognition
Computer Networks and Communications

Access to Document

10.1145/1815695.1815725

Cite this

@inproceedings{98b3a6c763a748dcb89666b83fc24d79,

title = "Code alignment for architectures with pipeline group dispatching",

abstract = "Today's architectures exploit long pipelines in order to increase instruction-level parallelism by grouping sets of consecutive instructions and feeding them into the pipeline with the purpose of executing them in a single cycle. The IBM Power architecture executes programs by dispatching groups of instructions where a dispatch group is fed as a whole into the pipeline to be executed in a single cycle. Such architecture, however, includes many cases of pipeline delays that result from dependencies between the resources of separate groups. As a result, there is a need to optimize the code in order to help the architecture place all the instructions in such a way that will produce as few delays as possible. Optimizing the alignment and the placement of the code is therefore, crucial to the performance of the program in such architectures. We show that in some cases, without proper code alignment, performance can degrade by 40% due to the impact of code alignment on the grouping and pipeline delays. We present a new binary-level and profile-based code alignment algorithm for architectures that make use of group dispatching. We show a steady performance gain of about 2-3% for fully optimized code running on IBM Power 6 while completely eliminating performance instability which can sometimes result in up to 40% variation in performance in the absence of the proposed code alignment algorithm. As the algorithm is based on gathered profiling and applies at binary-level, it can, therefore, be used as part of existing dynamic compilers and enabled on top of the operating system at runtime.",

author = "Omer Boehm and Gadi Haber and Helena Kosachevsky",

year = "2010",

doi = "10.1145/1815695.1815725",

language = "אנגלית",

isbn = "9781605589084",

series = "ACM International Conference Proceeding Series",

booktitle = "Proceedings of SYSTOR 2010 - The 3rd Annual Haifa Experimental Systems Conference",

note = "3rd Annual Haifa Experimental Systems Conference, SYSTOR 2010 ; Conference date: 24-05-2010 Through 26-05-2010",

}

Boehm, O, Haber, G & Kosachevsky, H 2010, Code alignment for architectures with pipeline group dispatching. in Proceedings of SYSTOR 2010 - The 3rd Annual Haifa Experimental Systems Conference., 23, ACM International Conference Proceeding Series, 3rd Annual Haifa Experimental Systems Conference, SYSTOR 2010, Haifa, Israel, 24/05/10. https://doi.org/10.1145/1815695.1815725

TY - GEN

T1 - Code alignment for architectures with pipeline group dispatching

AU - Boehm, Omer

AU - Haber, Gadi

AU - Kosachevsky, Helena

PY - 2010

Y1 - 2010

N2 - Today's architectures exploit long pipelines in order to increase instruction-level parallelism by grouping sets of consecutive instructions and feeding them into the pipeline with the purpose of executing them in a single cycle. The IBM Power architecture executes programs by dispatching groups of instructions where a dispatch group is fed as a whole into the pipeline to be executed in a single cycle. Such architecture, however, includes many cases of pipeline delays that result from dependencies between the resources of separate groups. As a result, there is a need to optimize the code in order to help the architecture place all the instructions in such a way that will produce as few delays as possible. Optimizing the alignment and the placement of the code is therefore, crucial to the performance of the program in such architectures. We show that in some cases, without proper code alignment, performance can degrade by 40% due to the impact of code alignment on the grouping and pipeline delays. We present a new binary-level and profile-based code alignment algorithm for architectures that make use of group dispatching. We show a steady performance gain of about 2-3% for fully optimized code running on IBM Power 6 while completely eliminating performance instability which can sometimes result in up to 40% variation in performance in the absence of the proposed code alignment algorithm. As the algorithm is based on gathered profiling and applies at binary-level, it can, therefore, be used as part of existing dynamic compilers and enabled on top of the operating system at runtime.

AB - Today's architectures exploit long pipelines in order to increase instruction-level parallelism by grouping sets of consecutive instructions and feeding them into the pipeline with the purpose of executing them in a single cycle. The IBM Power architecture executes programs by dispatching groups of instructions where a dispatch group is fed as a whole into the pipeline to be executed in a single cycle. Such architecture, however, includes many cases of pipeline delays that result from dependencies between the resources of separate groups. As a result, there is a need to optimize the code in order to help the architecture place all the instructions in such a way that will produce as few delays as possible. Optimizing the alignment and the placement of the code is therefore, crucial to the performance of the program in such architectures. We show that in some cases, without proper code alignment, performance can degrade by 40% due to the impact of code alignment on the grouping and pipeline delays. We present a new binary-level and profile-based code alignment algorithm for architectures that make use of group dispatching. We show a steady performance gain of about 2-3% for fully optimized code running on IBM Power 6 while completely eliminating performance instability which can sometimes result in up to 40% variation in performance in the absence of the proposed code alignment algorithm. As the algorithm is based on gathered profiling and applies at binary-level, it can, therefore, be used as part of existing dynamic compilers and enabled on top of the operating system at runtime.

UR - http://www.scopus.com/inward/record.url?scp=77955008198&partnerID=8YFLogxK

U2 - 10.1145/1815695.1815725

DO - 10.1145/1815695.1815725

M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???

AN - SCOPUS:77955008198

SN - 9781605589084

T3 - ACM International Conference Proceeding Series

BT - Proceedings of SYSTOR 2010 - The 3rd Annual Haifa Experimental Systems Conference

T2 - 3rd Annual Haifa Experimental Systems Conference, SYSTOR 2010

Y2 - 24 May 2010 through 26 May 2010

ER -

Code alignment for architectures with pipeline group dispatching

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this