DOI QR코드

DOI QR Code

Text Mining in Online Social Networks: A Systematic Review

  • Alhazmi, Huda N (Umm Al-Qura University, College of Computer and Information System)
  • Received : 2022.03.05
  • Published : 2022.03.30

Abstract

Online social networks contain a large amount of data that can be converted into valuable and insightful information. Text mining approaches allow exploring large-scale data efficiently. Therefore, this study reviews the recent literature on text mining in online social networks in a way that produces valid and valuable knowledge for further research. The review identifies text mining techniques used in social networking, the data used, tools, and the challenges. Research questions were formulated, then search strategy and selection criteria were defined, followed by the analysis of each paper to extract the data relevant to the research questions. The result shows that the most social media platforms used as a source of the data are Twitter and Facebook. The most common text mining technique were sentiment analysis and topic modeling. Classification and clustering were the most common approaches applied by the studies. The challenges include the need for processing with huge volumes of data, the noise, and the dynamic of the data. The study explores the recent development in text mining approaches in social networking by providing state and general view of work done in this research area.

Keywords

References

  1. L. Sorensen, "User managed trust in social networking - Comparing Facebook, MySpace and Linkedin," 2009 1st International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology, 2009, pp. 427-431.
  2. M. Naaman, "Social Multimedia: Highlighting Opportunities for Search and Mining of Multimedia Data in Social Media Applications," Multimedia Tools Appl, vol. 56, no. 1, pp. 9-34, 2012. https://doi.org/10.1007/s11042-010-0538-7
  3. S. Rani and P.Kumar, "A sentiment analysis system for social media using machine learning techniques: Social enablement," Digital Scholarship in the Humanities, vol. 34, no. 4, 2018.
  4. D. Sudarsa, S. K. Pathuri, and L. J. Rao, "Sentiment Analysis for Social Networks Using Machine Learning Techniques," International Journal of Engineering and Technology(UAE), vol. 7, no. 2, pp. 473-476, 2018. https://doi.org/10.14419/ijet.v7i4.34.27388
  5. K. Jani, M. Chaudhuri, H. Patel, and M. Shah, "Machine learning in films: an approach towards automation in film censoring," Journal of Data, Information and Management, vol. 2, pp. 55-64, 2019. https://doi.org/10.1007/s42488-019-00016-9
  6. N. Naw, "Twitter sentiment analysis using support vector machine and K-NN classifiers," Int. J. Sci. Res. Publ, vol. 8, no. 10, pp. 407-411, 2018.
  7. M. Grcar, D. Cherepnalkoski, I. Mozetic, and N. K. Petra, "Stance and influence of Twitter users regarding the Brexit referendum," Computational Social Networks, vol.4, no. 1, 2017.
  8. G. Piatetsky-Shapiro, "Data mining and knowledge discovery 1996 to 2005: overcoming the hype and moving from "university" to "business" and "analytics"." Data Min Knowl Disc, vol. 15, no. 1, pp. 99-105, 2007.. https://doi.org/10.1007/s10618-006-0058-2
  9. J Cao, K. Basoglu, H. Sheng, and P. Lowry, "A Systematic Review of Social Networks Research in Information Systems: Building a Foundation for Exciting Future Research," Communications of the Association for Information Systems, vol 36, pp. 227-758, 2015.
  10. H. B. Haq, H. Kayani, S. K. Toor, A. Mansoor, and A. Raheem, "The Impact Of Social Media: A Survey," International Journal of Scientific & Technology Research, vol. 9, pp. 341-348, 2021.
  11. D. M. Boyd and N. B. Ellison, "Social network sites: Definition, history, and scholarship," Journal of Computer-Mediated Communication, vol. 13, no.1, pp. 210-230, 2007. https://doi.org/10.1111/j.1083-6101.2007.00393.x
  12. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, "Trawling the Web for emerging cybercommunities," Computer Networks: The International Journal of Computer and Telecommunications Networking, vol. 31, no. 11-16, pp. 1481-1493, 1999. https://doi.org/10.1016/S1389-1286(99)00040-7
  13. G. Miner, J. Elder, and R. Nisbet, Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. Waltham, MA: Academic Press, 2012.
  14. F. Liu and L. Xiong, "Survey on text clustering algorithm - Research present situation of text clustering algorithm," IEEE 2nd International Conference on Software Engineering and Service Science, pp. 196-199.
  15. U Fayyad, G. Piatetsky-Shapiro, and S. Smyth, "The KDD process for extracting useful knowledge from volumes of data," Communications of the ACM, vol. 39, no. 11, pp. 27-34, 1996. https://doi.org/10.1145/240455.240464
  16. M. A. Emran and K. Shaalan, "A Survey of Intelligent Language Tutoring Systems," 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2014, pp. 393-399.
  17. Y. Zhao, "Analysing twitter data with text mining and social network analysis," in Proceedings of the 11th Australasian Data Mining and Analytics Conference (AusDM 2013), 2013, p. 23.
  18. G. Forman and E. Kirshenbaum, "Extremely fast text feature extraction for classification and indexing," In Proceedings of 17th ACM Conference on Information and Knowledge Management, Napa Valley California, USA 2008, pp. 26-30.
  19. D. Blei, A. Ng, and M.I. Jordan, "Latent dirichlet allocation," Journal of Machine Learning Research, vol. 3, no. 1, pp. 993-1022, 2003.
  20. M. Rudiger, D. Antons, and O. Salge, "From text to data: on the role and effect of text pre-processing in text mining research," Academy of Management Proceedings, vol. 2017, no.1, 2017.
  21. K. Yoshida, Y Tsuruoka, Y. Miyao, and J. Tsujii, "Ambiguous part-of-speech tagging for improving accuracy and domain portability of syntactic parsers," In Proceedings of 20th International Conference on Artificial Intelligence, Hyderabad, India, 2007, pp. 1783-1788.
  22. J. Hua, W. D. Tembe and E. R. Dougherty, "Performance of feature-selection methods in theclassification of high-dimension data," Pattern Recognition, vol. 42, no.3 , pp. 409-424, 2009. https://doi.org/10.1016/j.patcog.2008.08.001
  23. C. B. H. Shekar and G. Shoba, "Classification of documents using Kohonens self organizing map. International," Journal of Computer Theory and Engineering (IACSIT), vol. 1, no. 5, pp. 610-613, 2009.
  24. J. Akaichi, Z. Dhouioui and M. J. Lopez-Huertas Perez, "Text mining facebook status updates for sentiment classification," 2013 17th International Conference on System Theory, Control and Computing (ICSTCC), 2013, pp. 640-645.
  25. E. Cambria, "Affective Computing and Sentiment Analysis," in IEEE Intelligent Systems, vol. 31, no. 2, pp. 102-107, 2016. https://doi.org/10.1109/MIS.2016.31
  26. H. Gupta, M. S. Jamal, S. Madisetty and M. S. Desarkar, "A framework for real-time spam detection in Twitter," 2018 10th International Conference on Communication Systems & Networks (COMSNETS), 2018, pp. 380-383.
  27. A. Akundi, B. Tseng, J. Wu, E. Smith, S. Mandapaka, and F.Aguirre, "Text Mining to Understand the Influence of Social Media Applications on Smartphone Supply Chain," Procedia Computer Science, 2018, PP. 87-94.
  28. V. Dhanalakshmi, D. Bino and A. M. Saravanan, "Opinion mining from student feedback data using supervised learning algorithms," 2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC), 2016, pp. 1-5.
  29. T. Nicolas and D. Constantinos, "Opinion Mining From Social Media Short Texts: Does Collective Intelligence Beat Deep Learning?," Frontiers in Robotics and AI, vol. 5, p. 138 , 2019. https://doi.org/10.3389/frobt.2018.00138
  30. F. Gesualdo et al., "How do Twitter users react to TV broadcasts dedicated to vaccines in Italy?," Eur J Public Health, vol. 30, no. 3, pp. 510-515, 2020. https://doi.org/10.1093/eurpub/ckz233
  31. G. Nath, R. Ghosh, and R. Nath, "Cluster Analysis of Customer Reviews: Summarizing Customer Reviews to Help Manufacturers Identify Customer Satisfaction Level," Procedding of 7th international conference of business analytics and intellegenc, India, 2019.
  32. C. Lewis and S. Young, "Fad or future? Automated analysis of financial text and its implications for corporate reporting," Accounting and Business Research, vol. 49, no. 5, pp. 587-615, 2009. https://doi.org/10.1080/00014788.2019.1611730
  33. M. D. Tapi Nzali, S. Bringay, C. Lavergne, C. Mollevi, and T. Opitz, "What Patients Can Tell Us: Topic Analysis for Social Media on Breast Cancer," JMIR Med Inform, vol. 5, no. 3, 2017.
  34. M. Ezzeldin and W. El-Dakhakhni, "Metaresearching Structural Engineering Using Text Mining: Trend Identifications and Knowledge Gap Discoveries," Journal of Structural Engineering-asce, vol. 146, pp. 04020061, 2020. https://doi.org/10.1061/(ASCE)ST.1943-541X.0002523
  35. A. Amado, P. Cortez, P. Rita, and S. Moro, "Research Trends On Big Data In Marketing: A Text Mining And Topic Modeling Based Literature Analysis," European Research on Management and Business Economics (ERMBE), vol. 24, no. 1, pp. 1-7, 2017. https://doi.org/10.1016/j.iedeen.2017.06.002
  36. J. Kalyanam, T. Katsuki, G. Lanckriet, and T. M.Mackey, "Exploring trends of nonmedical use of prescription drugs and polydrug abuse in the Twittersphere using unsupervised machine learning," Addict Behav, vol. 65, pp. 289-295, 2017. https://doi.org/10.1016/j.addbeh.2016.08.019
  37. B. Batrinca, "Social media analytics: a survey of techniques, tools and platforms," AI & SOCIETY, vol. 30, no. 1, pp. 89-116, 2015. https://doi.org/10.1007/s00146-014-0549-4
  38. K. Sumathy and M. Chidambaram, "Text Mining: Concepts, Applications, Tools and Issues An Overview," International Journal of Computer Applications, vol. 80, pp. 29-32, 2013.
  39. B. Ahmed, G. Ali, A. Hussain, A. Baseer, and J. Ahmed, "Analysis of Text Feature Extractors using Deep Learning on Fake News", Eng. Technol. Appl. Sci. Res., vol. 11, no. 2, pp. 7001-7005, Apr. 2021. https://doi.org/10.48084/etasr.4069
  40. J. Piskorski, R. Yangarber, "Information Extraction: Past, Present and Future," in Multi-source, Multilingual Information Extraction and Summarization, Berlin, Heidelberg, Springer, 2013, pp. 23-49.
  41. K. Hammar, S. Jaradat, N. Dokoohaki, and M. Matskin, "Deep Text Mining of Instagram Data without Strong Supervision," IEEE/WIC/ACM International Conference on Web Intelligence (WI),2018, pp. 158-165.
  42. W. He, S. Zha, and L. Li, "Social media competitive analysis and text mining: A case study in the pizza industry," International Journal of Information Management, vol. 33, no. 3, pp. 464-472, 2013. https://doi.org/10.1016/j.ijinfomgt.2013.01.001
  43. B. Nie and S. Shouqian, "Using Text Mining Techniques to Identify Research Trends: A Case Study of Design Research," Applied Sciences, vol. 7, pp. 401, 2017. https://doi.org/10.3390/app7040401
  44. K. Wang and Y. Zhang, "Topic Sentiment Analysis in Online Learning Community from College Students" Journal of Data and Information Science, vol.5, no.2, pp.33-61, 2020. https://doi.org/10.2478/jdis-2020-0009
  45. J. Mir, A. Mahmood, and S. Khatoon, "Aspect Based Classification Model for Social Reviews", Eng. Technol. Appl. Sci. Res., vol. 7, no. 6, pp. 2296-2302.
  46. A. Golestani, M. Masli, N. S. Shami, J. Jones, A. Menon and J. Mondal, "Real-Time Prediction of Employee Engagement Using Social Media and Text Mining," 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018, pp. 1383-1387.
  47. I. Smirnov, "Estimating educational outcomes from students' short texts on social media," EPJ Data Sci. vol. 9, pp. 27, 2020. https://doi.org/10.1140/epjds/s13688-020-00245-8
  48. A. Bhardwaj, "Sentiment Analysis and Text Classification for Social Media Contents Using Machine Learning Techniques," Proceedings of the 2nd International Conference on IoT, Social, Mobile, Analytics & Cloud in Computational Vision & Bio-Engineering (ISMAC-CVB 2020), November, 2020.
  49. H. Isah, P. Trundle and D. Neagu, "Social media analysis for product safety using text mining and sentiment analysis," 2014 14th UK Workshop on Computational Intelligence (UKCI), 2014, pp. 1-7.
  50. W. He, X. Tian, R. Tao, W. Zhang, W. Yan, and V. Akula, "Application of social media analytics: A case of analyzing online hotel reviews", Online Information Review, 2017.
  51. D. Flores-Ruiz, A. Elizondo-Salto, and M. Barroso-Gonzalez, "Using Social Media in Tourist Sentiment Analysis: A Case Study of Andalusia during the Covid-19 Pandemic", Sustainability, vol. 13, no. 7, pp. 3836, 2021. https://doi.org/10.3390/su13073836
  52. C. Bhadane, H. Dalal, and H. Doshi, "Sentiment Analysis: Measuring Opinions," Procedia Computer Science, vol. 45, pp. 808-814, 2015. https://doi.org/10.1016/j.procs.2015.03.159
  53. H. Saif, Y. He, M. Fernandez, and H. Alani, "Contextual semantics for sentiment analysis of Twitter," Information Processing & Management, vol. 52, no.1, pp. 5-19, 2016. https://doi.org/10.1016/j.ipm.2015.01.005
  54. Z. Hu, J. Hu, W. Ding and X. Zheng, "Review Sentiment Analysis Based on Deep Learning," 2015 IEEE 12th International Conference on e-Business Engineering, 2015.
  55. S. Gunduzalp and G. Sener, "The Analysis of Opinions About Teaching Profession on Twitter Through Text Mining," Research on Education and Media, vol.12, no.1, pp.3-12, 2020. https://doi.org/10.2478/rem-2020-0002
  56. X. Chen, M. Vorvoreanu, and K. Madhavan, "Mining Social Media Data for Understanding Students' Learning Experiences," in IEEE Transactions on Learning Technologies, vol. 7, no. 3, pp. 246-259, July-Sept. 2014. https://doi.org/10.1109/TLT.2013.2296520
  57. M. Ahmad, and S. Aftab, "Analyzing the Performance of SVM for Polarity Detection with Different Datasets," International Journal of Modern Education and Computer Science, vol. 9, pp. 29-36, 2017. https://doi.org/10.5815/ijmecs.2017.10.04
  58. A. Abbasi, Y.Zhou, S. Deng, and P. Zhang, "Text analytics to support sense-making in social media: a language-action perspective," MIS Q, vol. 42, no. 2, pp. 427-464, 2018. https://doi.org/10.25300/MISQ/2018/13239
  59. M. Madhukar and S. Verma, "Hybrid Semantic Analysis of Tweets: A Case Study of Tweets on Girl-Child in India", Eng. Technol. Appl. Sci. Res., vol. 7, no. 5, pp. 2014-2016, Oct. 2017. https://doi.org/10.48084/etasr.1246
  60. K. Wegrzyn-Wolska, L. Bougueroua and G. Dziczkowski, "Social media analysis for e-health and medical purposes," 2011 International Conference on Computational Aspects of Social Networks (CASoN), 2011, pp. 278-283.
  61. A. Zielinski, S. Middleton, L. Tokarchuk, and X. Wang, "Social Media Text Mining and Network Analysis for Decision Support in Natural Crisis Management," International Conference on Information Systems for Crisis Response and Management ISCRAM, Baden-Baden, Germany, 2013.
  62. R. Bekkerman, R. El-Yaniv, N. Tishby, and Y. Winter, "Distributional Word Clusters vs. Words for Text Categorization," Journal of Machine Learning Research, vol. 3, pp. 1183-1208, 2003.