DOI QR코드

DOI QR Code

A layer-wise frequency scaling for a neural processing unit

  • Chung, Jaehoon (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
  • Kim, HyunMi (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
  • Shin, Kyoungseon (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
  • Lyuh, Chun-Gi (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
  • Cho, Yong Cheol Peter (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
  • Han, Jinho (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
  • Kwon, Youngsu (AI SoC Research Division, Electronics and Telecommunications Research Institute) ;
  • Gong, Young-Ho (School of Computer and Information, Engineering, Kwangwoon University) ;
  • Chung, Sung Woo (Department of Computer Science, Korea University)
  • Received : 2022.03.10
  • Accepted : 2022.06.23
  • Published : 2022.10.10

Abstract

Dynamic voltage frequency scaling (DVFS) has been widely adopted for runtime power management of various processing units. In the case of neural processing units (NPUs), power management of neural network applications is required to adjust the frequency and voltage every layer to consider the power behavior and performance of each layer. Unfortunately, DVFS is inappropriate for layer-wise run-time power management of NPUs due to the long latency of voltage scaling compared with each layer execution time. Because the frequency scaling is fast enough to keep up with each layer, we propose a layerwise dynamic frequency scaling (DFS) technique for an NPU. Our proposed DFS exploits the highest frequency under the power limit of an NPU for each layer. To determine the highest allowable frequency, we build a power model to predict the power consumption of an NPU based on a real measurement on the fabricated NPU. Our evaluation results show that our proposed DFS improves frame per second (FPS) by 33% and saves energy by 14% on average, compared with DVFS.

Keywords

Acknowledgement

This work was supported in part by the Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2018-0-00195) and in part by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C2003500).

References

  1. N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-L. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, In-datacenter performance analysis of a tensor processing unit, (Proceedings of the 44th Annual International Symposium on Computer Architechture, Toronto, Canada), 2017, pp. 1-12.
  2. Y. C. P. Cho, J. Chung, J. Yang, C. G. Lyuh, H. M. Kim, C. Kim, J. S. Ham, M. Choi, K. Shin, J. Han, and Y. Kwon, AB9: A neural processor for inference acceleration, ETRI J. 42 (2020), no. 4, 491-504. https://doi.org/10.4218/etrij.2020-0134
  3. Y. G. Kim, M. Kim, J. M. Kim, and S. W. Chung, M-DTM: Migration-based dynamic thermal management for heterogeneous mobile multi-core processors, (Design, Automation & Test in Europe Conference & Exhibition, Grenoble, France), 2015, pp. 1533-1538.
  4. I. Paul, S. Manne, M. Arora, W. L. Bircher, and S. Yalamanchili, Cooperative boosting: Needy versus greedy power management, (Proc. Int. Symp. Comput. Archit., Tel Aviv, Israel), 2013, pp. 285-296.
  5. W. Jiang, H. Yu, and Y. Ha, Enabling fine-grained dynamic voltage and frequency scaling in SDSoC, (32nd IEEE International System-on-Chip Conference, Singapore), 2019, pp. 56-61.
  6. X. Jiang, Y. Pang, X. Li, and J. Pan, Speed up deep neural network based pedestrian detection by sharing features across multi-scale models, Neurocomputing 185 (2016), 163-170. https://doi.org/10.1016/j.neucom.2015.12.042
  7. J. Han, M. Choi, and Y. Kwon, 40-TFLOPS artificial intelligence processor with function-safe programmable manycores for ISO26262 ASIL-D, ETRI J. 42 (2020), no. 4, 468-479. https://doi.org/10.4218/etrij.2020-0128
  8. R. A. Bridges, N. Imam, and T. M. Mintz, Understanding GPU power: A survey of profiling, modeling, and simulation methods, ACM Comput. Surv. 49 (2017), no. 3, 1-27.
  9. J. Guerreiro, A. Ilic, N. Roma, and P. Tomas, GPGPU power modeling for multi-domain voltage-frequency scaling, (IEEE International Sysposium on High Performance Computer Architecture, Vienna, Austria), 2018, pp. 789-800.
  10. G. Singla, G. Kaur, A. K. Unver, and U. Y. Ogras, Predictive dynamic thermal and power management for heterogeneous mobile platforms, (Design, Automation & Test In Europe Conference & Exhibition, Grenoble, France), 2015, pp. 960-965.
  11. Y. Kwon, J. Yang, Y. P. Cho, K.-S. Shin, J. Chung, J. Han, C.-G. Lyuh, H.-M. Kim, C. Kim, and M.-S. Choi, Function-safe vehicular AI processor with nano core-in-memory architecture, (IEEE International Conference on Artificial Intelligence Circuits and Systems, Hsinchu, Taiwan), 2019, pp. 127-131.
  12. J. H. Yahya, H. Volos, D. B. Bartolini, G. Antoniou, J. S. Kim, Z. Wang, K. Kalaitzidis, T. Rollet, Z. Chen, Y. Geng, O. Mutlu, and Y. Sazeides, AgileWatts: An energy-efficient CPU core idle-state architecture for latency-sensitive server applications, arXive preprint, 2022. https://doi.org/10.48550/arXiv.2203.02550
  13. H. Kim, J. Chung, K. Shin, C.-G. Lyuh, H. M. Kim, C. Kim, Y. C. P. Cho, J. Yang, J.-S. Ham, M. Choi, J. Han, and Y. Kwon, Live demonstration: A neural processor for AI acceleration, (IEEE International Symposium on Circuits and Systems, Deagu, Rep. of Korea), 2021. https://doi.org/10.1109/ISCAS51556.2021.9401074
  14. Renesas, ISL69122, 2018, Available from: https://www.intersil.com/products/ISL69122 [last accessed March 2022].