DOI QR코드

DOI QR Code

A Study on Performance of ML Algorithms and Feature Extraction to detect Malware

멀웨어 검출을 위한 기계학습 알고리즘과 특징 추출에 대한 성능연구

  • Received : 2017.12.29
  • Accepted : 2018.02.09
  • Published : 2018.02.28

Abstract

In this paper, we studied the way that classify whether unknown PE file is malware or not. In the classification problem of malware detection domain, feature extraction and classifier are important. For that purpose, we studied what the feature is good for classifier and the which classifier is good for the selected feature. So, we try to find the good combination of feature and classifier for detecting malware. For it, we did experiments at two step. In step one, we compared the accuracy of features using Opcode only, Win. API only, the one with both. We founded that the feature, Opcode and Win. API, is better than others. In step two, we compared AUC value of classifiers, Bernoulli Naïve Bayes, K-nearest neighbor, Support Vector Machine and Decision Tree. We founded that Decision Tree is better than others.

이 논문에서는 알려지지 않은 PE 파일이 멀웨어의 여부를 분류하는 방법을 연구하였다. 멀웨어 탐지 영역의 분류 문제에서는 특징 추출과 분류가 중요하다. 위와 같은 목적으로 멀웨어 탐지를 위해 우리는 어떠한 특징들이 분류기에 적합한지, 어떠한 분류기가 선택된 특징들에 대해 연구하였다. 그래서 우리는 멀웨어 탐지를 위한 기능과 분류기의 좋은 조합을 찾기 위해 실험하였다. 이를 위해 두 단계로 실험을 실시하였다. 1 단계에서는 Opcode, Windows API, Opcode + Windows API의 특징들을 이용하여 정확도를 비교하였다. 여기에서 Opcode + Windows API 특징이 다른 특징보다 더 좋은 결과를 나타내었다. 2 단계에서는 나이브 베이즈, K-NN, SVM, DT의 분류기들의 AUC 값을 비교하였다. 그 결과 DT의 분류기가 더 좋은 결과 값을 나타내었다.

Keywords

References

  1. G Bala Krishna, V Radha, K Venugopala Rao, "Review of Contemporary Literature on Machine Learning based Malware Analysis and Detection Strategies," Global Journal of Computer Science and Technology, vol. 16, Issue. 5, version 1.0, pp 11-16, 2016.
  2. B Kolosnjaji, A Zarras, G Webster, C Eckert, "Deep Learning for Classification of Malware System Call sequences," in Australasian Joint Conference on Artificial Intelligence, pp 137-149, 2016.
  3. Z. Bu et al., McAfee Threats Report: Second Quarter 2012, McAfee Labs, 2012.
  4. Ga-Young Bae et al., "Applying Machine Learning Algorithm to Method for Detecting Malware Using Opcode", Journal of Korea Information and Communications Society Summer Conference 2016, Vol.60, pp1327-1328, 2016.
  5. Seung-Won Lee, Reversing Important Principles: Malware analyst's reversing talk, Insight, pp 141-143, 2012.
  6. Ye, Yanfang, et al. "A Survey on Malware Detection Using Data Mining Techniques," ACM Computing Surveys (CSUR) vol.50, no.3, 41p, 2017.
  7. Jeong-been Park, Kyoung-Soo Han, Eul-Gyu Im, "Malware Classification Using Worth Opcodes," Proceedings of the Korea Information Science 2014 Korea Computer Conference, pp943-945, Jun, 2014.
  8. R. Swinburne, "Bayes' Theorem," Philosophical Review of France and the Foreigner, vol. 194, no. 2, pp250-251, 2004.
  9. Python Library, scikit-learn, Bernoulli naive bayes, http://scikit-learn.org/stable/modules/naive_bayes.html.
  10. Tong, Simon, and Daphne Koller. "Support vector machine active learning with applications to text classification." Journal of machine learning research, pp 45-66, Nov 2001.
  11. Han, Eui-Hong Sam, George Karypis, and Vipin Kumar. "Text categorization using weight adjusted k-nearest neighbor classification." Pacific-asia conference on knowledge discovery and data mining. Springer, Berlin, Heidelberg, 2001.
  12. Safavian, S. Rasoul, and David Landgrebe. "A survey of decision tree classifier methodology." IEEE transactions on systems, man, and cybernetics Vol. 21. No. 3 pp. 660-674, 1991 https://doi.org/10.1109/21.97458
  13. E. Carrera, Pefile, https://github.com/erocarrera/pefile.
  14. Capstone, capstone, http://www.capstone-engine.org.
  15. virusshare, https://virusshare.com.
  16. joxeankoret, http://malwareurls.joxeankoret.com.
  17. malc0de, http://malc0de.com.
  18. malwareblacklist, http://www.malwareblacklist.com.
  19. Hanley, James A., and Barbara J. McNeil. "The meaning and use of the area under a receiver operating characteristic (ROC) curve." Radiology Vol. 143, No.1 pp 29-36. 1982. https://doi.org/10.1148/radiology.143.1.7063747
  20. Tae-Hyun Ahn, Sang-Jin Oh, Young-Man Kwon, "Malware Detection Method using Opcode and windows API Calls", The Journal of The Institute of Internet, Broadcasting and Communication, Vol. 17, No. 6, pp. 11-17, Dec 2017. DOI: https://doi.org/10.7236/JIIBC.2017.17.6.11