TY - GEN
T1 - Using Multiple Clocks in Highlevel Synthesis to overcome unbalanced clock cycles
AU - Asher, Yosi Ben
AU - Qashqoush, Ibrahim
N1 - Publisher Copyright: © 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - High-level Synthesis (HLS) is a technique to compile C/C++ algorithmic code directly to hardware circuits (Verilog/VHDL). Typically, HLS schedulers partition the graph of operations to layers L1,L2, ⋯, Lk where the operations in each layer are executed in consecutive clock cycles. The execution time for each layer could be different, leading to unbalanced cycles because cannot ensure finding optimal clock-width by finding a common divisor between the layer's execution time. Previous works like Re-clocking and Chaining attempt to balance the scheduling layers by moving operations from proceeding layers to the current one, optimizing clock latency, and finding optimal clock-width. However, unlike to solution proposed here, these techniques may sometimes fail to obtain the optimal solution because they cannot ensure finding a common divisor between the layer's execution time. The proposed technique is to use separate overlapping clocks clk1,clk2, ⋯, clkk for each layer wherein the 0 → 1 transion in each clki occures at Σ latency({Lj) | Lj} ∈ (longest path to Li). and 1 → 0 transition in each clki occurs at (Σ latency({Lj) | Lj} ∈ (longest path to Li) + Lilatency In this way, the time-waists of unbalanced latencies disappear into an optimal common clock period of each clki for straight line code, and statistically near the optimal for non-straight line code. A second contribution of this work is an extension of this technique to perform global scheduling, i.e., obtain balanced layers for multiple basic blocks of a full control-flow graph (e.g., a combination of nested-loops and conditional statements). This technique was implemented in the LLVM compiler. Results suggest significant improvements in execution time and power compared to a commercial HLS tool (Vitis).
AB - High-level Synthesis (HLS) is a technique to compile C/C++ algorithmic code directly to hardware circuits (Verilog/VHDL). Typically, HLS schedulers partition the graph of operations to layers L1,L2, ⋯, Lk where the operations in each layer are executed in consecutive clock cycles. The execution time for each layer could be different, leading to unbalanced cycles because cannot ensure finding optimal clock-width by finding a common divisor between the layer's execution time. Previous works like Re-clocking and Chaining attempt to balance the scheduling layers by moving operations from proceeding layers to the current one, optimizing clock latency, and finding optimal clock-width. However, unlike to solution proposed here, these techniques may sometimes fail to obtain the optimal solution because they cannot ensure finding a common divisor between the layer's execution time. The proposed technique is to use separate overlapping clocks clk1,clk2, ⋯, clkk for each layer wherein the 0 → 1 transion in each clki occures at Σ latency({Lj) | Lj} ∈ (longest path to Li). and 1 → 0 transition in each clki occurs at (Σ latency({Lj) | Lj} ∈ (longest path to Li) + Lilatency In this way, the time-waists of unbalanced latencies disappear into an optimal common clock period of each clki for straight line code, and statistically near the optimal for non-straight line code. A second contribution of this work is an extension of this technique to perform global scheduling, i.e., obtain balanced layers for multiple basic blocks of a full control-flow graph (e.g., a combination of nested-loops and conditional statements). This technique was implemented in the LLVM compiler. Results suggest significant improvements in execution time and power compared to a commercial HLS tool (Vitis).
UR - http://www.scopus.com/inward/record.url?scp=85184658448&partnerID=8YFLogxK
U2 - 10.1109/mcsoc60832.2023.00087
DO - 10.1109/mcsoc60832.2023.00087
M3 - Conference contribution
T3 - Proceedings - 2023 16th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2023
SP - 552
EP - 559
BT - Proceedings - 2023 16th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2023
Y2 - 18 December 2023 through 21 December 2023
ER -