DOI QR코드

DOI QR Code

An Ensemble Approach to Detect Fake News Spreaders on Twitter

  • 투고 : 2022.05.05
  • 발행 : 2022.05.30

초록

Detection of fake news is a complex and a challenging task. Generation of fake news is very hard to stop, only steps to control its circulation may help in minimizing its impacts. Humans tend to believe in misleading false information. Researcher started with social media sites to categorize in terms of real or fake news. False information misleads any individual or an organization that may cause of big failure and any financial loss. Automatic system for detection of false information circulating on social media is an emerging area of research. It is gaining attention of both industry and academia since US presidential elections 2016. Fake news has negative and severe effects on individuals and organizations elongating its hostile effects on the society. Prediction of fake news in timely manner is important. This research focuses on detection of fake news spreaders. In this context, overall, 6 models are developed during this research, trained and tested with dataset of PAN 2020. Four approaches N-gram based; user statistics-based models are trained with different values of hyper parameters. Extensive grid search with cross validation is applied in each machine learning model. In N-gram based models, out of numerous machine learning models this research focused on better results yielding algorithms, assessed by deep reading of state-of-the-art related work in the field. For better accuracy, author aimed at developing models using Random Forest, Logistic Regression, SVM, and XGBoost. All four machine learning algorithms were trained with cross validated grid search hyper parameters. Advantages of this research over previous work is user statistics-based model and then ensemble learning model. Which were designed in a way to help classifying Twitter users as fake news spreader or not with highest reliability. User statistical model used 17 features, on the basis of which it categorized a Twitter user as malicious. New dataset based on predictions of machine learning models was constructed. And then Three techniques of simple mean, logistic regression and random forest in combination with ensemble model is applied. Logistic regression combined in ensemble model gave best training and testing results, achieving an accuracy of 72%.

키워드

참고문헌

  1. Kwak, H., et al. What is Twitter, a social network or a news media? in Proceedings of the 19th international conference on World wide web. 2010.
  2. Asur, S. and B.A. Huberman. Predicting the future with social media. in 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology. 2010. IEEE.
  3. Zhang, X., H. Fuehres, and P.A. Gloor, Predicting stock market indicators through twitter "I hope it is not as bad as I fear". Procedia-Social and Behavioral Sciences, 2011. 26: p. 55-62. https://doi.org/10.1016/j.sbspro.2011.10.562
  4. Signorini, A., A.M. Segre, and P.M. Polgreen, The use of Twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic. Plos one, 2011. 6(5): p. e19467. https://doi.org/10.1371/journal.pone.0019467
  5. Bodnar, T., et al. On the ground validation of online diagnosis with Twitter and medical records. in Proceedings of the 23rd International Conference on World Wide Web. 2014.
  6. Garrett, R.K. and B.E. Weeks. The promise and peril of real-time corrections to political misperceptions. in Proceedings of the 2013 conference on Computer supported cooperative work. 2013.
  7. Chatfield, A.T., C.G. Reddick, and K. Choi. Online media use of false news to frame the 2016 Trump Presidential Campaign. in Proceedings of the 18th Annual International Conference on Digital Government Research. 2017.
  8. Cerf, V.G., Information and misinformation on the internet, 2016, ACM New York, NY, USA.
  9. Turk, Z., Technology as enabler of fake news and a potential tool to combat it2018: European Parliament.
  10. Weedon, J., W. Nuland, and A. Stamos, Information operations and Facebook. Retrieved from Facebook: https://fbnewsroomus.files.wordpress.com/2017/04/facebook-and-information-operations-v1.pdf, 2017.
  11. Gottfried, J. and E. Shearer, News use across social media platforms 2016. 2016.
  12. Allcott, H. and M. Gentzkow, Social media and fake news in the 2016 election. Journal of economic perspectives, 2017. 31(2): p. 211-36. https://doi.org/10.1257/jep.31.2.211
  13. Ananth, S., et al., Fake news detection using convolution neural network in deep learning. International Journal of Innovative Research in Computer and Communication Engineering, 2019. 7(1): p. 49-63.
  14. Schow, A., The 4 Types of 'Fake News'. online] Observer. Available at: http://observer.com/2017/01/fake-news-russia-hacking-clinton-loss, 2017.
  15. Rubin, V.L., Y. Chen, and N.K. Conroy, Deception detection for news: three types of fakes. Proceedings of the Association for Information Science and Technology, 2015. 52(1): p. 1-4.
  16. Hanselowski, A., et al., A retrospective analysis of the fake news challenge stance detection task. arXiv preprint arXiv:1806.05180, 2018.
  17. Ferreira, W. and A. Vlachos. Emergent: a novel dataset for stance classification. in Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies. 2016.
  18. Augenstein, I., et al., Stance detection with bidirectional conditional encoding. arXiv preprint arXiv:1606.05464, 2016.
  19. Bourgonje, P., J.M. Schneider, and G. Rehm. From clickbait to fake news detection: an approach based on detecting the stance of headlines to articles. in Proceedings of the 2017 EMNLP workshop: natural language processing meets journalism. 2017.
  20. Karodia, S., M. Lu, and K. Kinmont, Fake News Detection on Twitter.
  21. Rubin, V.L., et al. Fake news or truth? using satirical cues to detect potentially misleading news. in Proceedings of the second workshop on computational approaches to deception detection. 2016.
  22. Ahmed, H., I. Traore, and S. Saad. Detection of online fake news using n-gram analysis and machine learning techniques. in International conference on intelligent, secure, and dependable systems in distributed and cloud environments. 2017. Springer.
  23. Aphiwongsophon, S. and P. Chongstitvatana. Detecting fake news with machine learning method. in 2018 15th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). 2018. IEEE.
  24. Rangel, F., et al. Overview of the 2nd author profiling task at pan 2014. in CEUR Workshop Proceedings. 2014. CEUR Workshop Proceedings.
  25. Iftene, A., et al. A real-time system for credibility on Twitter. in Proceedings of the 12th Language Resources and Evaluation Conference. 2020.
  26. Thompson, K.M., Assessment of information credibility in Twitter, 2014, University of Georgia.
  27. Mendoza, M., B. Poblete, and C. Castillo. Twitter under crisis: Can we trust what we RT? in Proceedings of the first workshop on social media analytics. 2010.
  28. Castillo, C., M. Mendoza, and B. Poblete. Information credibility on twitter. in Proceedings of the 20th international conference on World wide web. 2011.
  29. ODonovan, J., et al. Credibility in context: An analysis of feature distributions in twitter. in 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing. 2012. IEEE.
  30. Atodiresei, C.-S., A. Tanaselea, and A. Iftene, Identifying fake news and fake users on Twitter. Procedia Computer Science, 2018. 126: p. 451-461. https://doi.org/10.1016/j.procs.2018.07.279
  31. Soni, V.D., Prediction of Geniunity of News using advanced Machine Learning and Natural Language processing Algorithms. International Journal of Innovative Research in Science Engineering and Technology 2018. 7(5): p. 6349-6354.
  32. Vishwakarma, D.K., D. Varshney, and A. Yadav, Detection and veracity analysis of fake news via scrapping and authenticating the web search. Cognitive Systems Research, 2019. 58: p. 217-229. https://doi.org/10.1016/j.cogsys.2019.07.004
  33. Rangel, F., et al. Overview of the 8th author profiling task at PAN 2020: profiling fake news spreaders on Twitter. in CEUR Workshop Proceedings. 2020. Sun SITE Central Europe.
  34. Moreno-Sandoval, L.G., et al. Assembly of Polarity, Emotion and User Statistics for Detection of Fake Profiles. in CLEF (Working Notes). 2020.
  35. Vogel, I. and M. Meghana. Fake News Spreader Detection on Twitter using Character N-Grams. in CLEF (Working Notes). 2020.
  36. Duan, X., et al. RMIT at PAN-CLEF 2020: Profiling Fake News Spreaders on Twitter. in CLEF (Working Notes). 2020.
  37. Gaspay, A., L. Legorreta, and S. Dardan, Collaboration Systems and Technologies Track Co-Chairs: Jay F. Nunamaker Jr. and Robert O. Briggs.