A layer-wise frequency scaling for a neural processing unit

Chung, Jaehoon;Kim, HyunMi;Shin, Kyoungseon;Lyuh, Chun-Gi;Cho, Yong Cheol Peter;Han, Jinho;Kwon, Youngsu;Gong, Young-Ho;Chung, Sung Woo;

doi:10.4218/etrij.2022-0094

ETRI Journal

Volume 44 Issue 5
/
Pages.849-858
/
2022
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

A layer-wise frequency scaling for a neural processing unit

Chung, Jaehoon (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
Kim, HyunMi (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
Shin, Kyoungseon (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
Lyuh, Chun-Gi (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
Cho, Yong Cheol Peter (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
Han, Jinho (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
Kwon, Youngsu (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
Gong, Young-Ho (School of Computer and Information, Engineering, Kwangwoon University) ;
Chung, Sung Woo (Department of Computer Science, Korea University)

Received : 2022.03.10
Accepted : 2022.06.23
Published : 2022.10.10

https://doi.org/10.4218/etrij.2022-0094 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Dynamic voltage frequency scaling (DVFS) has been widely adopted for runtime power management of various processing units. In the case of neural processing units (NPUs), power management of neural network applications is required to adjust the frequency and voltage every layer to consider the power behavior and performance of each layer. Unfortunately, DVFS is inappropriate for layer-wise run-time power management of NPUs due to the long latency of voltage scaling compared with each layer execution time. Because the frequency scaling is fast enough to keep up with each layer, we propose a layerwise dynamic frequency scaling (DFS) technique for an NPU. Our proposed DFS exploits the highest frequency under the power limit of an NPU for each layer. To determine the highest allowable frequency, we build a power model to predict the power consumption of an NPU based on a real measurement on the fabricated NPU. Our evaluation results show that our proposed DFS improves frame per second (FPS) by 33% and saves energy by 14% on average, compared with DVFS.

Keywords

Acknowledgement

This work was supported in part by the Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2018-0-00195) and in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C2003500).

References

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-L. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, In-datacenter performance analysis of a tensor processing unit, (Proceedings of the 44th Annual International Symposium on Computer Architechture, Toronto, Canada), 2017, pp. 1-12.
Y. C. P. Cho, J. Chung, J. Yang, C. G. Lyuh, H. M. Kim, C. Kim, J. S. Ham, M. Choi, K. Shin, J. Han, and Y. Kwon, AB9: A neural processor for inference acceleration, ETRI J. 42 (2020), no. 4, 491-504. https://doi.org/10.4218/etrij.2020-0134
Y. G. Kim, M. Kim, J. M. Kim, and S. W. Chung, M-DTM: Migration-based dynamic thermal management for heterogeneous mobile multi-core processors, (Design, Automation & Test in Europe Conference & Exhibition, Grenoble, France), 2015, pp. 1533-1538.
I. Paul, S. Manne, M. Arora, W. L. Bircher, and S. Yalamanchili, Cooperative boosting: Needy versus greedy power management, (Proc. Int. Symp. Comput. Archit., Tel Aviv, Israel), 2013, pp. 285-296.
W. Jiang, H. Yu, and Y. Ha, Enabling fine-grained dynamic voltage and frequency scaling in SDSoC, (32nd IEEE International System-on-Chip Conference, Singapore), 2019, pp. 56-61.
X. Jiang, Y. Pang, X. Li, and J. Pan, Speed up deep neural network based pedestrian detection by sharing features across multi-scale models, Neurocomputing 185 (2016), 163-170. https://doi.org/10.1016/j.neucom.2015.12.042
J. Han, M. Choi, and Y. Kwon, 40-TFLOPS artificial intelligence processor with function-safe programmable manycores for ISO26262 ASIL-D, ETRI J. 42 (2020), no. 4, 468-479. https://doi.org/10.4218/etrij.2020-0128
R. A. Bridges, N. Imam, and T. M. Mintz, Understanding GPU power: A survey of profiling, modeling, and simulation methods, ACM Comput. Surv. 49 (2017), no. 3, 1-27.
J. Guerreiro, A. Ilic, N. Roma, and P. Tomas, GPGPU power modeling for multi-domain voltage-frequency scaling, (IEEE International Sysposium on High Performance Computer Architecture, Vienna, Austria), 2018, pp. 789-800.
G. Singla, G. Kaur, A. K. Unver, and U. Y. Ogras, Predictive dynamic thermal and power management for heterogeneous mobile platforms, (Design, Automation & Test In Europe Conference & Exhibition, Grenoble, France), 2015, pp. 960-965.
Y. Kwon, J. Yang, Y. P. Cho, K.-S. Shin, J. Chung, J. Han, C.-G. Lyuh, H.-M. Kim, C. Kim, and M.-S. Choi, Function-safe vehicular AI processor with nano core-in-memory architecture, (IEEE International Conference on Artificial Intelligence Circuits and Systems, Hsinchu, Taiwan), 2019, pp. 127-131.
J. H. Yahya, H. Volos, D. B. Bartolini, G. Antoniou, J. S. Kim, Z. Wang, K. Kalaitzidis, T. Rollet, Z. Chen, Y. Geng, O. Mutlu, and Y. Sazeides, AgileWatts: An energy-efficient CPU core idle-state architecture for latency-sensitive server applications, arXive preprint, 2022. https://doi.org/10.48550/arXiv.2203.02550
H. Kim, J. Chung, K. Shin, C.-G. Lyuh, H. M. Kim, C. Kim, Y. C. P. Cho, J. Yang, J.-S. Ham, M. Choi, J. Han, and Y. Kwon, Live demonstration: A neural processor for AI acceleration, (IEEE International Symposium on Circuits and Systems, Deagu, Rep. of Korea), 2021. https://doi.org/10.1109/ISCAS51556.2021.9401074
Renesas, ISL69122, 2018, Available from: https://www.intersil.com/products/ISL69122 [last accessed March 2022].

ETRI Journal

A layer-wise frequency scaling for a neural processing unit

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)