DOI QR코드

DOI QR Code

Building a Dynamic Analyzer for CUDA based System.

  • SALAH T. ALSHAMMARI (King Abdul-Aziz University, College of Computing and Information Technology)
  • Received : 2023.08.05
  • Published : 2023.08.30

Abstract

The utilization of GPUs on general-purpose computers is currently on the rise due to the increase in its programmability and performance requirements. The utility of tools like NVIDIA's CUDA have been designed to allow programmers to code algorithms by using C-like language for the execution process on the graphics processing units GPU. Unfortunately, many of the performance and correctness bugs will happen on parallel programs. The CUDA tool support for the parallel programs has not yet been actualized. The use of a dynamic analyzer to find performance and correctness bugs in CUDA programs facilitates the execution of sophisticated processes, especially in modern computing requirements. Any race conditions bug it will impact of program correctness and the share memory bank conflicts to improve the overall performance. The technique instruments the programs in a way that promotes accessibility of the memory locations accessed by different threads well as to check for any bugs in the code of a program. The instrumented source code will be used initiated directly in the device emulation code of CUDA to send report for the user about all errors. The current degree of automation helps programmers solve subtle bugs in highly complex programs or programs that cannot be analyzed manually.

Keywords

References

  1. Che, Shuai, et al. "A performance study of general-purpose applications on graphics processors using CUDA." Journal of parallel and distributed computing 68.10 (2008): 1370-1380.  https://doi.org/10.1016/j.jpdc.2008.05.014
  2. Clarke, Edmund, et al. "Progress on the state explosion problem in model checking." Informatics. Springer, Berlin, Heidelberg, 2001. 
  3. Fang, Jianbin, Ana Lucia Varbanescu, and Henk Sips. "A comprehensive performance comparison of CUDA and OpenCL." Parallel Processing (ICPP), 2011 International Conference on. IEEE, 2011. 
  4. Farooqui, Naila, et al. "A framework for dynamically instrumenting GPU compute applications within GPU Ocelot." Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units. ACM, 2011. 
  5. Kerr, Andrew, Gregory Diamos, and Sudhakar Yalamanchili. "A characterization and analysis of ptx kernels." Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. IEEE, 2009. 
  6. Li, Peng, Guodong Li, and Ganesh Gopalakrishnan. "Parametric flows: automated behavior equivalencing for symbolic analysis of races in CUDA programs." Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 2012. 
  7. Sanders, Jason, and Edward Kandrot. CUDA by Example: An Introduction to General-Purpose GPU Programming, Portable Documents. Addison-Wesley Professional, 2010. 
  8. Yang, Yi, and Huiyang Zhou. "CUDA-NP: realizing nested thread-level parallelism in GPGPU applications." ACM SIGPLAN Notices. Vol. 49. No. 8. ACM, 2014. 
  9. Yang, Zhiyi, Yating Zhu, and Yong Pu. "Parallel image processing based on CUDA." Computer Science and Software Engineering, 2008 International Conference on. Vol. 3. IEEE, 2008. 
  10. Zheng, Mai, et al. "GRace: a low-overhead mechanism for detecting data races in GPU programs." ACM SIGPLAN Notices. Vol. 46. No. 8. ACM, 2011. 
  11. Harris, Mark. "Optimizing cuda." SC07: High Performance Computing With CUDA, 2007. 
  12. Ryoo, Shane, et al. "Optimization principles and application performance evaluation of a multithreaded GPU using CUDA." Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. ACM, 2008. 
  13. Ruetsch, Greg, and Paulius Micikevicius. "Optimizing matrix transpose in CUDA." Nvidia CUDA SDK Application Note 18 (2009). 
  14. Iwai, Keisuke, Takakazu Kurokawa, and Naoki Nisikawa. "AES encryption implementation on CUDA GPU and its analysis." Networking and Computing (ICNC), 2010 First International Conference on. IEEE, 2010. 
  15. S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. Anderson. ''Eraser: A dynamic data race detector for multithreaded programs''. ACM Transactions on Computer Systems, 15(4):391-411, 1997.  https://doi.org/10.1145/265924.265927
  16. Michael Boyer , Kevin Skadron , Westley Weimer "Automated Dynamic Analysis of CUDA Programs"
  17. A. Ghanbari, S. Benton, and L. Zhang, "Practical program repair via bytecode mutation," in International Symposium on Software Testing and Analysis. ACM, pp. 19-30, 2019.