Performance Analysis and Identifying Characteristics of Processing-in-Memory System with Polyhedral Benchmark Suite

프로세싱 인 메모리 시스템에서의 PolyBench 구동에 대한 동작 성능 및 특성 분석과 고찰

  • Jeonggeun Kim (School of Computer Science and Engineering, College of IT Engineering, Kyungpook National University)
  • 김정근 (경북대학교 IT대학 컴퓨터학부)
  • Received : 2023.09.10
  • Accepted : 2023.09.18
  • Published : 2023.09.30

Abstract

In this paper, we identify performance issues in executing compute kernels from PolyBench, which includes compute kernels that are the core computational units of various data-intensive workloads, such as deep learning and data-intensive applications, on Processing-in-Memory (PIM) devices. Therefore, using our in-house simulator, we measured and compared the various performance metrics of workloads based on traditional out-of-order and in-order processors with Processing-in-Memory-based systems. As a result, the PIM-based system improves performance compared to other computing models due to the short-term data reuse characteristic of computational kernels from PolyBench. However, some kernels perform poorly in PIM-based systems without a multi-layer cache hierarchy due to some kernel's long-term data reuse characteristics. Hence, our evaluation and analysis results suggest that further research should consider dynamic and workload pattern adaptive approaches to overcome performance degradation from computational kernels with long-term data reuse characteristics and hidden data locality.

Keywords

Acknowledgement

This research was supported by National Research Foundation of Korea (NRF) Grant funded by the Korean Government (Ministry of Education) NRF-2021R1I1A 1A01059737 and the MSIT (Ministry of Science and ICT), Korea, under the Innovative Human Resource Development for Local Intellectualization support program (IITP2023-RS-2022-00156389) supervised by the IITP (Institute for Information & communications Technology Planning & Evaluation).

References

  1. Ghose, S., Boroumand, A., Kim, J. S., Gomez-Luna, J., and Mutlu, O., "Processing-in-memory: A workloaddriven perspective," IBM Journal of Research and Development, Vol. 63(6), pp. 3-1, 2019.
  2. Qureshi, M., "With new memories come new challenges," IEEE Micro, Vol. 39(1), pp. 52-53, 2019. https://doi.org/10.1109/MM.2019.2892195
  3. Mutlu, O., Ghose, S., Gomez-Luna, J., and Ausavarungnirun, R., "Processing data where it makes sense: Enabling in-memory computation," Microprocessors and Microsystems, Vol. 67, pp. 28-41, 2019. https://doi.org/10.1016/j.micpro.2019.01.009
  4. Singh, G., et al., "Near-memory computing: Past, present, and future," Microprocessors and Microsystems, Vol. 71, pp. 102868, 2019.
  5. Pouchet, LN., "Polybench: The polyhedral benchmark suite," Retrieved from: https://web.cs.ucla.edu/~pouchet/software/polybench/
  6. Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., and Cavazos, J., "Auto-tuning a high-level language targeted to GPU codes," In 2012 innovative parallel computing (InPar), IEEE, pp. 1-10, 2012.
  7. Yuki, T., "Understanding polybench/c 3.2 kernels," International workshop on polyhedral compilation techniques (IMPACT), 2014.
  8. Wei, Y., Zhou, M., Liu, S., Seemakhupt, K., Rosing, T., and Khan, S., "PIMProf: an automated program profiler for processing-in-memory offloading decisions," In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, pp. 855-860, 2022.
  9. Ghiasi, N. M., et al., "ALP: Alleviating CPU-Memory Data Movement Overheads in Memory-Centric Systems," IEEE Transactions on Emerging Topics in Computing, 2022.
  10. Abella-Gonzalez, M. A., Carollo-Fernandez, P., Pouchet, L. N., Rastello, F., and Rodriguez, G., "PolyBench/ Python: benchmarking Python environments with polyhedral optimizations," In Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction, pp. 59-70, 2021.
  11. Henning, J. L., "SPEC CPU2006 benchmark descriptions," ACM SIGARCH Computer Architecture News, Vol. 34(4), pp. 1-17, 2006. https://doi.org/10.1145/1186736.1186737
  12. Bucek, J., Lange, K. D., and v. Kistowski, J., "SPEC CPU2017: Next-generation compute benchmark," In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, pp. 41-42, 2018.
  13. SPEC CPU® 2017, Retrieved from: https://www.spec.org/cpu2017/
  14. PolyBench/ACC, Retrieved from: https://cavazos-lab.github.io/PolyBench-ACC/
  15. Luk, C. K., et al., "Pin: building customized program analysis tools with dynamic instrumentation," Acm sigplan notices, Vol. 40(6), pp. 190-200, 2005. https://doi.org/10.1145/1064978.1065034
  16. Intel® Pin. Retrieved from: https://www.intel.com/content/www/us/en/developer/articles/tool/pin-a-dynamicbinary-instrumentation-tool.html
  17. Ke, Liu, et al. "Near-memory processing in action: Accelerating personalized recommendation with axdimm," IEEE Micro, Vol. 42(1), pp. 116-127, 2021. https://doi.org/10.1109/MM.2021.3097700
  18. Gomez-Luna, J., El Hajj, I., Fernandez, I., Giannoula, C., Oliveira, G. F., and Mutlu, O., "Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system," IEEE Access, Vol. 10, pp. 52565-52608, 2022. https://doi.org/10.1109/ACCESS.2022.3174101
  19. UPMEM. Retrieved from: https://www.upmem.com/
  20. Hassan, M., Park, C. H., and Black-Schaffer, D., "A reusable characterization of the memory system behavior of spec2017 and spec2006," ACM Transactions on Architecture and Code Optimization (TACO), Vol. 18(2), pp. 1-20, 2021. https://doi.org/10.1145/3446200
  21. Gober, N., et al., "The championship simulator: Architectural simulation for education and competition," arXiv preprint arXiv:2210.14324, 2022.
  22. ChampSim. Retrieved from: https://github.com/Champ Sim/ChampSim
  23. Kim, Y., Yang, W., and Mutlu, O., "Ramulator: A fast and extensible DRAM simulator," IEEE Computer architecture letters, Vol. 15(1), pp. 45-49, 2015. https://doi.org/10.1109/LCA.2015.2414456
  24. Chatterjee, N., et al., "Usimm: the utah simulated memory module," University of Utah, Tech. Rep, pp. 1-24, 2012.
  25. Choi, J. H., "Lifetime Extension Method for Non-Volatile Memory based Deep Learning System by analyzing Data Write Pattern," Journal of the Semiconductor & Display Technology, Vol. 21(3), pp. 1-6, 2022.
  26. Yoon, S. K., and Nah, J. E., "Hybrid Memory Adaptor for OpenStack Swift Object Storage," Journal of the Semiconductor & Display Technology, Vol. 19(3), pp. 61-67, 2020.
  27. Park, S. H., and Park, C. S., "Implementation of GPU Acceleration of Object Detection Application with Drone Video," Journal of the Semiconductor & Display Technology, Vol. 20(3), pp. 117-119, 2021
  28. Pawlowski, J. T., "Hybrid memory cube (HMC)," In 2011 IEEE Hot chips 23 symposium (HCS), pp. 1-24, IEEE, 2011.
  29. Yu, C., Liu, S., and Khan, S., "Multipim: A detailed and configurable multi-stack processing-in-memory simulator," IEEE Computer Architecture Letters, Vol. 20(1), pp. 54-57, 2021. https://doi.org/10.1109/LCA.2021.3061905
  30. Gao, M., Ayers, G., and Kozyrakis, C., "Practical neardata processing for in-memory analytics frameworks," In 2015 International Conference on Parallel Architecture and Compilation (PACT), pp. 113-124, IEEE, 2015.
  31. Min, C., Mao, J., Li, H., and Chen, Y., "NeuralHMC: An efficient HMC-based accelerator for deep neural networks," In Proceedings of the 24th Asia and South Pacific Design Automation Conference, pp. 394-399, 2019.