DOI QR코드

DOI QR Code

URL Filtering by Using Machine Learning

  • Saqib, Malik Najmus (Department of Cybersecurity, College of Computer Science and Engineering University of Jeddah)
  • Received : 2022.08.05
  • Published : 2022.08.30

Abstract

The growth of technology nowadays has made many things easy for humans. These things are from everyday small task to more complex tasks. Such growth also comes with the illegal activities that are perform by using technology. These illegal activities can simple as displaying annoying message to big frauds. The easiest way for the attacker to perform such activities is to convenience user to click on the malicious link. It has been a great concern since a decay to classify URLs as malicious or benign. The blacklist has been used initially for that purpose and is it being used nowadays. It is efficient but has a drawback to update blacklist automatically. So, this method is replace by classification of URLs based on machine learning algorithms. In this paper we have use four machine learning classification algorithms to classify URLs as malicious or benign. These algorithms are support vector machine, random forest, n-nearest neighbor, and decision tree. The dataset that is used in this research has 36694 instances. A comparison of precision accuracy and recall values are shown for dataset with and without preprocessing.

Keywords

Acknowledgement

This work was funded by the University of Jeddah, Jeddah, Saudi Arabia, under grant No. (UJ-02-042-DR). The authors, therefore, acknowledge with thanks the University of Jeddah technical and financial support.

References

  1. Liang, B., Huang, J., Liu, F., Wang, D., Dong, D., and Liang, Z., Malicious Web Pages Detection Based on Abnormal Visibility Recognition. In EBISS. IEEE (2009)
  2. Garera, S., Provos, N., Chew, M., Rubin, A., A framework for detection and measurement of phishing attacks. In Proceedings of the 2007 ACM workshop on Recurring malcode. (2007)
  3. Sheng, S., et. al. An empirical analysis of phishing blacklists. In Proceedings of Sixth Conference on Email and Anti-Spam (CEAS) (2009).
  4. A. Bhagwat, K. Lodhi, S. Dalvi and U. Kulkarni, "An Implemention of a Mechanism for Malicious URLs Detection," 2019 6th International Conference on Computing for Sustainable Global Development, 2019, pp. 1008-1013
  5. Lee, O. V., Heryanto, A., Razak, M., "A malicious URLs detection system using optimization and machine learning classifiers", Indonesian Journal of Electrical Engineering and Computer Science, Vol. 17, No. 3, March 2020, pp. 1210~1214 https://doi.org/10.11591/ijeecs.v17.i3.pp1210-1214
  6. G. Chakraborty and T. T. Lin, "A URL address aware classification of malicious websites for online security during web-surfing," 2017 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), 2017, pp. 1-6, doi: 10.1109/ANTS.2017.8384155.
  7. Tung, S., Wong, K., Kuzminykh, I., Bakhshi, T., and Ghita, B., "Using a Machine Learning Model for Malicious URL Type Detection", In Internet of Things, Smart Spaces, and Next Generation Networks and Systems: 21st International Conference, Russia, August 26-27, 2021.
  8. Gupta, B., Yadav, K., Razzak, I., Psannis, K., Castiglione, A., Chang, X., "A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment", Computer Communications, Volume 175, Pages 47-57, 2021. https://doi.org/10.1016/j.comcom.2021.04.023
  9. Mondal, D., Singh, B., Hu, H., Biswas, S., Alom, Z., Azim, M., "SeizeMaliciousURL: A novel learning approach to detect malicious URLs", Journal of Information Security and Applications, Volume 62, 2021
  10. Li, T., Kou, G., Peng, Y., "Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods", Information Systems, Volume 91, 2020.
  11. Canadian Institute for Cybersecurity, University of New Brunswick, URL dataset (ISCX-URL2016)
  12. RFC 3986 Uniform Resource Identifier (URI): Generic Syntax, Network Working Group, 2005 Online: https://datatracker.ietf.org/doc/html/rfc3986