DOI QR코드

DOI QR Code

Enhancing the Text Mining Process by Implementation of Average-Stochastic Gradient Descent Weight Dropped Long-Short Memory

  • Received : 2022.07.05
  • Published : 2022.07.30

Abstract

Text mining is an important process used for analyzing the data collected from different sources like videos, audio, social media, and so on. The tools like Natural Language Processing (NLP) are mostly used in real-time applications. In the earlier research, text mining approaches were implemented using long-short memory (LSTM) networks. In this paper, text mining is performed using average-stochastic gradient descent weight-dropped (AWD)-LSTM techniques to obtain better accuracy and performance. The proposed model is effectively demonstrated by considering the internet movie database (IMDB) reviews. To implement the proposed model Python language was used due to easy adaptability and flexibility while dealing with massive data sets/databases. From the results, it is seen that the proposed LSTM plus weight dropped plus embedding model demonstrated an accuracy of 88.36% as compared to the previous models of AWD LSTM as 85.64. This result proved to be far better when compared with the results obtained by just LSTM model (with 85.16%) accuracy. Finally, the loss function proved to decrease from 0.341 to 0.299 using the proposed model

Keywords

Acknowledgement

The authors would like to thank the management of VNR Vignana Jyothi Institute of Engineering and Technology and Sreenidhi Institute of Science and Technology for their support and encouragement to carry out research programs at various stages.

References

  1. Acharjya, D.P. and Ahmed, K. "A survey on big data analytics: challenges, open research issues and tools." International Journal of Advanced Computer Science and Applications 7, no. 2 (2016): 511-518.
  2. Feng, Z. and Zhu, Y. "A survey on trajectory data mining: Techniques and applications." IEEE Access 4, pp. 2056-2067, 2016. https://doi.org/10.1109/ACCESS.2016.2553681
  3. Salloum, S.A. Al-Emran, M. Monem, A. A. and Shaalan, K. "Using text mining techniques for extracting information from research articles." In Intelligent natural language processing: Trends and Applications, pp. 373-397. Springer, Cham, 2018.
  4. Ferreira-Mello, R. Andre, M. Pinheiro, A. Costa, E. and. Romero, C. "Text mining in education." Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9, no. 6 (2019): e1332.
  5. Ignatow, G. and Mihalcea, R. Text mining: A guidebook for the social sciences. Sage Publications, 2016.
  6. Porter, L.A. and Cunningham, S. W. Tech mining: exploiting new technologies for competitive advantage. Vol. 29. John Wiley & Sons, 2004.
  7. Miner, G. Elder IV, J. Fast, A. Hill, T. Nisbet, R. and Delen, D. Practical text mining and statistical analysis for non-structured text data applications. Academic Press, 2012.
  8. Halavais, A. Search engine society. John Wiley & Sons, 2017.
  9. Ayed, A.B. Halima, M.B. and Alimi, A. M. "Survey on clustering methods: Towards fuzzy clustering for big data." In 2014 6th International conference of soft computing and pattern recognition (SoCPaR), pp. 331-336. IEEE, 2014.
  10. Manoharan S. "A smart image processing algorithm for text recognition information extraction and vocalization for the visually challenged." Journal of Innovative Image Processing (JIIP) 1, no. 01 (2019): 31-38. https://doi.org/10.36548/jiip.2019.1.004
  11. Eisenstein, J. Introduction to natural language processing. MIT press, 2019.
  12. Jovic, A. Brkic, K. and Bogunovic, N. "An overview of free software tools for general data mining." In 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 1112-1117. IEEE, 2014.
  13. Salloum, S.A. Al-Emran, M. Monem, A. A. and Shaalan, K. "Using text mining techniques for extracting information from research articles." In Intelligent natural language processing: Trends and Applications, pp. 373-397. Springer, Cham, 2018.
  14. Adnan, K. and Akbar, R. "An analytical study of information extraction from unstructured and multidimensional big data." Journal of Big Data 6, no. 1 (2019): 1-38. https://doi.org/10.1186/s40537-018-0162-3
  15. Pejic Bach, M. Krstic, Z. Seljan, S. and Turulja, L.. "Text mining for big data analysis in financial sector: A literature review." Sustainability 11, no. 5 (2019): 1277. https://doi.org/10.3390/su11051277
  16. Sun, W. Cai, Z. Li, Y. Liu, F. Fang, S. and Wang, G. "Data processing and text mining technologies on electronic medical records: a review." Journal of healthcare engineering 2018 (2018).
  17. Nimmagadda, S.L. Zhu, D. and Reiners, T. "On Managing Contextual Knowledge of Digital Document Ecosystems, characterized by Alphanumeric Textual Data." Procedia Computer Science 159 (2019): 1135-1144. https://doi.org/10.1016/j.procs.2019.09.282
  18. Hassani, H. Beneki, C. Unger, S. Taj Mazinani, M. and Yeganegi. M. R. "Text mining in big data analytics." Big Data and Cognitive Computing 4, no. 1 (2020): 1 https://doi.org/10.3390/bdcc4010001
  19. Mendhe, C.H. Henderson, N. Srivastava, G. and Mago, V. "A scalable platform to collect, store, visualize, and analyze big data in real time." IEEE Transactions on Computational Social Systems (2020).
  20. Boukhari, K. and Omri, M.N. "DL-VSM based document indexing approach for information retrieval." Journal of Ambient Intelligence and Humanized Computing (2020): 1-12.
  21. Luo, X. "Efficient english text classification using selected machine learning techniques." Alexandria Engineering Journal 60, no. 3 (2021): 3401-3409. https://doi.org/10.1016/j.aej.2021.02.009
  22. Vani, K. and Gupta, D. "Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: Comparisons, analysis and challenges." Information Processing & Management 54, no. 3 (2018): 408-432. https://doi.org/10.1016/j.ipm.2018.01.008
  23. Vijayarani, S. and Janani, R. "Text mining: open-source tokenization tools-an analysis." Advanced Computational Intelligence: An International Journal (ACII) 3, no. 1 (2016): 37-47. https://doi.org/10.5121/acii.2016.3104
  24. Curtis, B. Kellner, M.I. and Over, J. "Process modeling." Communications of the ACM 35, no. 9 (1992): 75-90. https://doi.org/10.1145/130994.130998
  25. De, S. Musil, F. Ingram, T. Baldauf, C. and Ceriotti, M. "Mapping and classifying molecules from a high-throughput structural database." Journal of cheminformatics 9, no. 1 (2017): 1-14. https://doi.org/10.1186/s13321-016-0187-6
  26. Usai, A. Pironti, M. Mital, M. and Mejri, C.A. "Knowledge discovery out of text data: a systematic review via text mining." Journal of knowledge management (2018).
  27. Zhang, R. Xiao, W. Zhang, H. Liu, Y. Lin, H. and Yang, M. "An empirical study on program failures of deep learning jobs." In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE), pp. 1159-1170. IEEE, 2020.
  28. Boer, F.D. Serbanescu, V. Hahnle, R. Henrio, L. Rochas, J. Din, C.C. Johnsen, E.B. et al. "A survey of active object languages." ACM Computing Surveys (CSUR) 50, no. 5 (2017): 1-39.
  29. Bhirud, N.S. "Grammar checkers for natural languages: a review." International Journal on Natural Language Computing (IJNLC)