DOI QR코드

DOI QR Code

Accuracy of Phishing Websites Detection Algorithms by Using Three Ranking Techniques

  • Received : 2022.02.05
  • Published : 2022.02.28

Abstract

Between 2014 and 2019, the US lost more than 2.1 billion USD to phishing attacks, according to the FBI's Internet Crime Complaint Center, and COVID-19 scam complaints totaled more than 1,200. Phishing attacks reflect these awful effects. Phishing websites (PWs) detection appear in the literature. Previous methods included maintaining a centralized blacklist that is manually updated, but newly created pseudonyms cannot be detected. Several recent studies utilized supervised machine learning (SML) algorithms and schemes to manipulate the PWs detection problem. URL extraction-based algorithms and schemes. These studies demonstrate that some classification algorithms are more effective on different data sets. However, for the phishing site detection problem, no widely known classifier has been developed. This study is aimed at identifying the features and schemes of SML that work best in the face of PWs across all publicly available phishing data sets. The Scikit Learn library has eight widely used classification algorithms configured for assessment on the public phishing datasets. Eight was tested. Later, classification algorithms were used to measure accuracy on three different datasets for statistically significant differences, along with the Welch t-test. Assemblies and neural networks outclass classical algorithms in this study. On three publicly accessible phishing datasets, eight traditional SML algorithms were evaluated, and the results were calculated in terms of classification accuracy and classifier ranking as shown in tables 4 and 8. Eventually, on severely unbalanced datasets, classifiers that obtained higher than 99.0 percent classification accuracy. Finally, the results show that this could also be adapted and outperforms conventional techniques with good precision.

Keywords

References

  1. B. B. Gupta, N. A. Arachchilage and K. E. Psannis, "Defending against phishing attacks: Taxonomy of methods, current issues and future directions," Telecommunication Systems, vol. 67, no. 2, pp. 247-267, 2018. https://doi.org/10.1007/s11235-017-0334-z
  2. APWG, "Phishing activity trends report," [Online]. http://www.antiphishing.org/APWG_PhishingActivity_Report_Jul_05.pdf, 2005. [Accessed in 28 Jun 2021].
  3. S. S. Smith, "2017 Internet crime report," Federal Bureau of Investigation, Washington, DC. [Online]. https://www.ic3.gov/Media/PDF/AnnualReport/2017_IC3Report.pdf, 2018. [Accessed in 28 Jun 2021].
  4. A. K. Jain and B. B. Gupta, "A machine learning based approach for phishing detection using hyperlinks information," Journal of Ambient Intelligence and Humanized Computing vol. 10, pp. 2015-2028, 2019. https://doi.org/10.1007/s12652-018-0798-z
  5. K. L. Chiew, C. L. Tan, K. Wong, K. S. Yong and W. K. Tiong, "A new hybrid ensemble feature selection framework for machine learning-based phishing detection system,". Information Sciences, vol. 484, pp. 153-166, 2019. https://doi.org/10.1016/j.ins.2019.01.064
  6. S. Marchal, K. Saari, N. Singh and N. Asokan, "Know your phish: Novel techniques for detecting phishing sites and their targets," in Proc. ICDCS, Nara, Japan , pp. 323-333, 2016.
  7. D. Sahoo, C. Liu and S. C. H. Hoi, "Malicious URL detection using machine learning: A survey," arXiv:cs.LG/1701.07179, vol. 1, no.1 pp. 1-37, 2019.
  8. S. Marchal, G. Armano, T. Grondahl, K. Saari, N. Singh et al., "Off-the-hook: An efficient and usable client-side phishing prevention application," IEEE Transactions on Computers, vol. 66, pp. 1717-1733, 2017. https://doi.org/10.1109/TC.2017.2703808
  9. C. Whittaker, B. Ryner, and M. Nazif, "Large-scale automatic classification of phishing pages," in Proc. (NDSS), San Diego, CA, pp. 1-14, 2010.
  10. G. Xiang, J. Hong, C. P. Rose and L. Cranor, "CANTINA+: A feature-rich machine learning framework for detecting phishing web sites," ACM Transactions on Information and System Security, vol. 14, no. 21, pp. 1-28, 2011.
  11. B. Cui, S. He, X. Yao and P. Shi, "Malicious URL detection with feature extraction based on machine learning," International Journal of High Performance Computing and Networking, vol. 12, pp. 166-178, 2018. https://doi.org/10.1504/ijhpcn.2018.094367
  12. R. Wang, "AdaBoost for feature selection, classification and its relation with SVM, a review," Physics Procedia, vol. 25, pp. 800-807, 2012. https://doi.org/10.1016/j.phpro.2012.03.160
  13. L. Breiman, J. Friedman, C. J. Stone and R. A. Olshen, "Classification and regression trees; CRC press," Boca Raton, Florida: CRC Press, 1984.
  14. W. Y. Loh, "Classification and regression trees," WIREs Data Mining and Knowledge Discovery, vol. 1, pp. 14-23, 2011. https://doi.org/10.1002/widm.8
  15. J. H. Friedman, "Stochastic gradient boosting," Computational Statistics and Data Analysis, vol. 38, pp. 367-378, 2002. https://doi.org/10.1016/S0167-9473(01)00065-2
  16. B. Widrow and M. Lehr, "30 years of adaptive neural networks: perceptron, madaline, and backpropagation," Proceedings of the IEEE, vol. 78, pp.1415-1442, 1990. https://doi.org/10.1109/5.58323
  17. D. D. Lewis, "Naive (Bayes) at forty: The independence assumption in information retrieval," in Proc. ECML-98, Chemnitz, DE, pp. 4-15, 1998.
  18. A. Cutler, D. R. Cutler and J. R. Stevens, "Random forests," in Ensemble machine learning; Boston, MA: Springer, pp. 157-175, 2012.
  19. B. Scholkopf, A. J. Smola, F. Bach, "Learning with kernels: support vector machines, regularization, optimization, and beyond," London, England: MIT press, 2002.
  20. M. Al-Sarem, F. Saeed, Z. G. Al-Mekhlafi, B. A. Mohammed, T. Al-Hadhrami et al., "An optimized stacking ensemble model for phishing websites detection," Electronics, vol. 10, no. 11, pp. 1-18, 2021.
  21. Z. G. Al-Mekhlafi, B. A. Mohammed, M. Al-Sarem, F. Saeed, T. Al-Hadhrami et al. "Phishing websites detection by using optimized stacking ensemble model," Computer Systems Science and Engineering, Accepted on Jun 2021, pp.1-17, 2021. doi: 10.32604/csse.2021.020414.
  22. P. Zhao, S. C and Hoi, "Cost-sensitive online active learning with application to malicious URL detection," in Proc. KDD13, New York, NY, USA, pp. 919-927, 2013.
  23. D. R. Patil and J. B. Patil, "Malicious URLs Detection Using Decision Tree Classifiers and Majority Voting Technique," Cybernetics and Information Technologies, vol.18, no.1, pp.11-29, 2018. https://doi.org/10.2478/cait-2018-0002
  24. T.C. Chen, T. Stepan, S. Dick and J. Miller, "An anti-phishing system employing diffused information," ACM Transactions on Information and System Security. vol.16, no 4, pp. 1-31, 2014.
  25. A. K. Jain and B. B. Gupta, "Towards detection of phishing websites on client-side using machine learning based approach," Telecommunication Systems, vol. 68, no. 1, pp. 687-700, 2018. https://doi.org/10.1007/s11235-017-0414-0
  26. R. Verma and K. Dyer, "On the character of phishing URLs: Accurate and robust statistical learning classifiers," in Proc. CODASPY'15, New York, NY, USA, pp. 111-122, 2015.
  27. H. Shirazi, B. Bezawada, and I. Ray, "Know thy doma1n name: Unbiased phishing detection using domain name based features," in Proc. SACMAT '18, New York, NY, USA, pp. 69-75, 2018.
  28. M. Adebowale, K. Lwin, E. Sanchez and M. Hossain, "Intelligent web-phishing detection and protection scheme using integrated features of Images, frames and text," Expert Systems with Applications, vol. 115, pp. 300-313, 2019. https://doi.org/10.1016/j.eswa.2018.07.067
  29. F. Vanhoenshoven, G. Napoles, R. Falcon, K. Vanhoof and M. Koppen, "Detecting malicious URLs using machine learning techniques," in Proc. SSCI, Athens, Greece, pp. 1-8, 2016.
  30. M. Karabatak and T. Mustafa, "Performance comparison of classifiers on reduced phishing website dataset," in Proc. ISDFS, Antalya, Turkey, pp. 1-5, 2018.
  31. J. Zhao, N. Wang, Q. Ma and Z. Cheng, "Classifying malicious URLs using gated recurrent neural networks," in Innovative Mobile and Internet Services in Ubiquitous Computing, Cham: Springer International Publishing, pp. 385-394, 2019.
  32. J. Ma, L. K. Saul, S. Savage and G. M. Voelker, "Beyond blacklists: Learning to detect malicious web sites from suspicious URLs," in Proc. KDD'09, New York, NY, USA, pp. 1245-1254, 2009.
  33. S. Marchal, J. Francois, R. State and T. Engel, "PhishStorm: Detecting phishing with streaming analytics," IEEE Transactions on Network and Service Management, vol. 11, pp.458-471, 2014. https://doi.org/10.1109/TNSM.2014.2377295
  34. W. Zhang, Q. Jiang, L. Chen and C. Li, "Two-stage ELM for phishing web pages detection using hybrid features," World Wide Web, vol. 20, pp.797-813, 2017. https://doi.org/10.1007/s11280-016-0418-9
  35. K. Thomas, C. Grier, J. Ma, V Paxson and D. Song, "Design and evaluation of a real-time URL spam filtering service," in Proc. IEEE Symposium on Security and Privacy, Oakland, CA, USA, pp. 447-462, 2011.
  36. R. Verma and A. Das, "What's in a URL: Fast feature extraction and malicious URL detection," in Proc. IWSPA '17, New York, NY, USA, pp. 55-63, 2017.
  37. C. Seifert, I. Welch and P. Komisarczuk, "Identification of malicious web pages with static heuristics," in Proc. 2008 Australasian Telecommunication Networks and Applications Conference, Adelaide, SA, Australia, pp. 91-96, 2008.
  38. J. Saxe and K. Berlin, ''eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys,'' 2017. [Online]. Available: https://arxiv.org/abs/1702.08568. [Accessed in 28 Jun 2021].
  39. A. Vazhayil, R. Vinayakumar and K. Soman, "Comparative study of the detection of malicious URLs using shallow and deep Networks," in Proc. ICCCNT, Bengaluru, India, pp. 1-6, 2018.
  40. S. Selvaganapathy, M. Nivaashini and H. Natarajan, "Deep belief network based detection and categorization of malicious URLs," Information Security Journal: A Global Perspective, vol. 27, no. 3, pp. 145-161, 2018. https://doi.org/10.1080/19393555.2018.1456577
  41. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, et al., "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825-2830, 2011.
  42. G. W. Snedecor and W. G. Cochran, "Statistical methods," 8th Ed., vol. 54, Ames, IO, USA: Iowa State Univ. Press, pp. 71-82, 1989.
  43. S. S. Shapiro and M. B. Wilk, "An Analysis of variance test for normality (complete samples)," Biometrika, vol. 52, No. ¾, pp. 591-611, 1995. https://doi.org/10.1093/biomet/52.3-4.591
  44. P. Vaitkevicius and V. Marcinkevicius, "Comparison of classification algorithms for detection of phishing websites." Informatica, vol. 31, pp. 143-160, 2020.