TY - JOUR
T1 - Enhancing DNN Computational Efficiency via Decomposition and Approximation
AU - Schweitzer, Ori
AU - Weiser, Uri
AU - Gabbay, Freddy
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - The increasing computational demands of emerging deep neural networks (DNNs) are fueled by their extensive computation intensity across various tasks, placing a significant strain on resources. This paper introduces DART, an adaptive microarchitecture that enhances area, power, and energy efficiency of DNN accelerators through approximated computations and decomposition, while preserving accuracy. DART improves DNN efficiency by leveraging adaptive resource allocation and simultaneous multi-threading (SMT). It exploits two prominent attributes of DNNs: resiliency and sparsity, of both magnitude and bit-level. Our microarchitecture decomposes the Multiply-and-Accumulate (MAC) into fine-grained elementary computational resources. Additionally,DART employs an approximate representation that leverages dynamic and flexible allocation of decomposed computational resources through SMT (Simultaneous Multi-Threading), thereby enhancing resource utilization and optimizing power consumption.We further improve efficiency by introducing a new Temporal SMT (tSMT) technique, which suggests processing computations from temporally adjacent threads by expanding the computational time window for resource allocation. Our simulation analysis, using a systolic array accelerator as a case study, indicates that DART can achieve more than 30% reduction in area and power, with an accuracy degradation of less than 1% in state-of-the-art DNNs in vision and natural language processing (NLP) tasks, compared to conventional processing elements (PEs) using 8-bit integer MAC units.
AB - The increasing computational demands of emerging deep neural networks (DNNs) are fueled by their extensive computation intensity across various tasks, placing a significant strain on resources. This paper introduces DART, an adaptive microarchitecture that enhances area, power, and energy efficiency of DNN accelerators through approximated computations and decomposition, while preserving accuracy. DART improves DNN efficiency by leveraging adaptive resource allocation and simultaneous multi-threading (SMT). It exploits two prominent attributes of DNNs: resiliency and sparsity, of both magnitude and bit-level. Our microarchitecture decomposes the Multiply-and-Accumulate (MAC) into fine-grained elementary computational resources. Additionally,DART employs an approximate representation that leverages dynamic and flexible allocation of decomposed computational resources through SMT (Simultaneous Multi-Threading), thereby enhancing resource utilization and optimizing power consumption.We further improve efficiency by introducing a new Temporal SMT (tSMT) technique, which suggests processing computations from temporally adjacent threads by expanding the computational time window for resource allocation. Our simulation analysis, using a systolic array accelerator as a case study, indicates that DART can achieve more than 30% reduction in area and power, with an accuracy degradation of less than 1% in state-of-the-art DNNs in vision and natural language processing (NLP) tasks, compared to conventional processing elements (PEs) using 8-bit integer MAC units.
KW - Approximate Computing
KW - Computer Architectures
KW - Deep Neural Networks
KW - Machine Learning Accelerators
UR - http://www.scopus.com/inward/record.url?scp=85213413755&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3521980
DO - 10.1109/ACCESS.2024.3521980
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85213413755
SN - 2169-3536
JO - IEEE Access
JF - IEEE Access
ER -