TY - GEN
T1 - Using Multi-op instructions as a way to generate ASIPs with optimized pipeline structure
AU - Asher, Yosi Ben
AU - Lipov, Irina
AU - Tartakovsky, Vladislav
AU - Tiv, Dror
N1 - Publisher Copyright: © 2014 IEEE.
PY - 2014/7/21
Y1 - 2014/7/21
N2 - We propose automatic synthesis of application specific instruction set processors (ASIPs). We use pipeline execution of multi-op machine-instructions, e.g., ∗(reg1∗reg2) = (∗reg3)+( ∗reg4) (C-syntax) an instruction with three memory stages and two arithmetic stages pipeline. The problem is, for a given set of loops, to find a pipeline configuration and a multiop ISA that maximizes the IPC (instructions per cycle) while minimizing the resource usage and the cost of interconnections to the register-file of the resulting CPU. The algorithm is based on finding an efficient cover of a large graph by a small set of convex sub-graphs gis that are consistent with a given structure of a pipeline. Unlike previous works, gis are not synthesized to circuits that are executed in a co-processor mode but rather both gis and the rest of the program are executed by the same set of multiop pipeline units. In this way we eliminate the overhead associated with the co-processor mode of regular ASIPs but maintain high values of IPC of these ASIPs. The main advantage of using pipeline execution of multi-op versus VLIW instructions is shown to be the cost of interconnections between the CPU's execution units and the register file. Thus, we devise a grading function that for each possible multi-op pipeline configuration balance between the expected IPC (Instructions Per Cycle) and the complexity of the interconnections. Using this grading function we show that in most cases the VLIW configuration is not always the best choice.
AB - We propose automatic synthesis of application specific instruction set processors (ASIPs). We use pipeline execution of multi-op machine-instructions, e.g., ∗(reg1∗reg2) = (∗reg3)+( ∗reg4) (C-syntax) an instruction with three memory stages and two arithmetic stages pipeline. The problem is, for a given set of loops, to find a pipeline configuration and a multiop ISA that maximizes the IPC (instructions per cycle) while minimizing the resource usage and the cost of interconnections to the register-file of the resulting CPU. The algorithm is based on finding an efficient cover of a large graph by a small set of convex sub-graphs gis that are consistent with a given structure of a pipeline. Unlike previous works, gis are not synthesized to circuits that are executed in a co-processor mode but rather both gis and the rest of the program are executed by the same set of multiop pipeline units. In this way we eliminate the overhead associated with the co-processor mode of regular ASIPs but maintain high values of IPC of these ASIPs. The main advantage of using pipeline execution of multi-op versus VLIW instructions is shown to be the cost of interconnections between the CPU's execution units and the register file. Thus, we devise a grading function that for each possible multi-op pipeline configuration balance between the expected IPC (Instructions Per Cycle) and the complexity of the interconnections. Using this grading function we show that in most cases the VLIW configuration is not always the best choice.
UR - http://www.scopus.com/inward/record.url?scp=84912521813&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/FCCM.2014.16
DO - https://doi.org/10.1109/FCCM.2014.16
M3 - Conference contribution
T3 - Proceedings - 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014
SP - 29
BT - Proceedings - 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014
Y2 - 11 May 2014 through 13 May 2014
ER -