DOI QR코드

DOI QR Code

Novel Optimizer AdamW+ implementation in LSTM Model for DGA Detection

  • Awais Javed (National University of Science and Technology) ;
  • Adnan Rashdi (National University of Science and Technology) ;
  • Imran Rashid (National University of Science and Technology) ;
  • Faisal Amir (National University of Science and Technology)
  • Received : 2023.11.05
  • Published : 2023.11.30

Abstract

This work take deeper analysis of Adaptive Moment Estimation (Adam) and Adam with Weight Decay (AdamW) implementation in real world text classification problem (DGA Malware Detection). AdamW is introduced by decoupling weight decay from L2 regularization and implemented as improved optimizer. This work introduces a novel implementation of AdamW variant as AdamW+ by further simplifying weight decay implementation in AdamW. DGA malware detection LSTM models results for Adam, AdamW and AdamW+ are evaluated on various DGA families/ groups as multiclass text classification. Proposed AdamW+ optimizer results has shown improvement in all standard performance metrics over Adam and AdamW. Analysis of outcome has shown that novel optimizer has outperformed both Adam and AdamW text classification based problems.

Keywords

References

  1. Eric M Hutchins, Michael J Cloppert, Rohan M Amin, et al. Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains. Leading Issues in Information Warfare & Security Research, 1(1):80, 2011
  2. Stefano Schiavoni, Federico Maggi, Lorenzo Cavallaro, and Stefano Zanero. Phoenix: Dga-based botnet tracking and intelligence. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pages 192-211. Springer, 2014.
  3. Srinivas Krishnan, Teryl Taylor, Fabian Monrose, and John McHugh. Crossing the threshold: Detecting network malfeasance via sequential hypothesis testing. In 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 1-12. IEEE, 2013.
  4. Marc Kuhrer, Christian Rossow, and Thorsten Holz. Paint it black: Evaluating the effectiveness of malware blacklists. In International Workshop on Recent Advances in Intrusion Detection, pages 1-21.Springer, 2014.
  5. Jonathan Woodbridge, Hyrum S Anderson, Anjum Ahuja, and Daniel Grant. Predicting domain generation algorithms with long short-term memory networks. arXiv preprint arXiv:1611.00791, 2016.
  6. R Vinayakumar, KP Soman, and Prabaharan Poornachandran. Detecting malicious domain names using deep learning approaches at scale. Journal of Intelligent & Fuzzy Systems, 34(3):1355-1367, 2018. https://doi.org/10.3233/JIFS-169431
  7. Duc Tran, Hieu Mac, Van Tong, Hai Anh Tran, and Linh Giang Nguyen. A LSTM based framework for handling multiclass imbalance in dga botnet detection. Neurocomputing, 275:2401-2413, 2018. https://doi.org/10.1016/j.neucom.2017.11.018
  8. Joshua Saxe and Konstantin Berlin. expose: A characterlevel convolutional neural network with embeddings for detecting malicious urls, file paths and registry keys. arXiv preprint arXiv:1702.08568,2017.
  9. Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. Convolutional neural networks for softmatching n-grams in ad-hoc search. In Proceedings of the eleventh ACM international conference on web search and data mining, pages 126-134, 2018.
  10. Shaofang Zhou, Lanfen Lin, Junkun Yuan, Feng Wang, Zhaoting Ling, and Jia Cui. Cnn-based dga detection with high coverage. In 2019 IEEE International Conference on Intelligence and Security Informatics (ISI), pages 62-67. IEEE, 2019.
  11. Yanchen Qiao, Bin Zhang, Weizhe Zhang, Arun Kumar Sangaiah, and Hualong Wu. Dga domain name classification method based on long short-term memory with attention mechanism. Applied Sciences,9(20):4205, 2019.
  12. Juhong Namgung, Siwoon Son, and Yang-Sae Moon. Efficient deep learning models for dga domain detection. Security and Communication Networks, 2021, 2021.
  13. Ning Qian. On the momentum term in gradient descent learning algorithms. Neural networks, 12(1):145-151, 1999. https://doi.org/10.1016/S0893-6080(98)00116-6
  14. John Duchi, Elad Hazan, and Yoram Singer. Adaptive sub-gradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
  15. Matthew D Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.
  16. Tijmen Tieleman, Geoffrey Hinton, et al. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning, 4(2):26-31, 2012.
  17. Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  18. Ilya Loshchilov and Frank Hutter. Fixing weight decay regularization in adam. 2018.
  19. Alexa. https://alexa.com.
  20. bamabanek. https://www.bambenekconsulting.com/.
  21. https://www.kaggle.com/awaisjaved/lstm-model-withoptimisers/