DOI QR코드

DOI QR Code

Developing and Pre-Processing a Dataset using a Rhetorical Relation to Build a Question-Answering System based on an Unsupervised Learning Approach

  • Dutta, Ashit Kumar (Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University) ;
  • Wahab sait, Abdul Rahaman (Center of Documents and Archive, King Faisal University) ;
  • Keshta, Ismail Mohamed (Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University) ;
  • Elhalles, Abheer (Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University)
  • Received : 2021.11.05
  • Published : 2021.11.30

Abstract

Rhetorical relations between two text fragments are essential information and support natural language processing applications such as Question - Answering (QA) system and automatic text summarization to produce an effective outcome. Question - Answering (QA) system facilitates users to retrieve a meaningful response. There is a demand for rhetorical relation based datasets to develop such a system to interpret and respond to user requests. There are a limited number of datasets for developing an Arabic QA system. Thus, there is a lack of an effective QA system in the Arabic language. Recent research works reveal that unsupervised learning can support the QA system to reply to users queries. In this study, researchers intend to develop a rhetorical relation based dataset for implementing unsupervised learning applications. A web crawler is developed to crawl Arabic content from the web. A discourse-annotated corpus is generated using the rhetorical structural theory. A Naïve Bayes based QA system is developed to evaluate the performance of datasets. The outcome shows that the performance of the QA system is improved with proposed dataset and able to answer user queries with an appropriate response. In addition, the results on fine-grained and coarse-grained relations reveal that the dataset is highly reliable.

Keywords

Acknowledgement

The authors would like to acknowledge the support provided by AlMaarefa University while conducting this research work.

References

  1. Liu, Y., Li, S., Zhang, X. and Sui, Z. (2016) 'Implicit discourse relation classification via multi-task neural networks', Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI'16), pp.2750-2756.
  2. Louis, A., Joshi, A. and Nenkova, A. (2010a) 'Discourse indicators for content selection in summarization', in Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Association for Computational Linguistics, pp.147-156.
  3. K. C. Ryding, A Reference Grammar of Modern Standard Arabic. Cambridge, U.K.: Cambridge Univ. Press, 2005.
  4. F. Aouladomar, ''Towards answering procedural questions,'' in Proc. IJCAI Workshop Knowl. Reasoning Answering Questions, 2005, pp. 1-11.
  5. Walaa Saber Ismail and Masun Nabhan Homsi," DAWQAS: A dataset for Arabic Why Question Answering system, Procedia computer science, vol.142, pp. 123 -131, 2018. https://doi.org/10.1016/j.procs.2018.10.467
  6. A. Farghaly and K. Shaalan, ''Arabic natural language processing: Challenges and solutions,'' ACM Trans. Asian Lang. Inf. Process., vol. 8, no. 4, pp. 1-22, 2009 https://doi.org/10.1145/1644879.1644880
  7. Lagrini, S., Redjimi, M. and Azizi, N. (2017) 'Automatic Arabic text summarization approaches', International Journal of Computer Applications, Vol. 164, No. 5, pp.31-37. https://doi.org/10.5120/ijca2017913628
  8. Lee, H.Y. and Renganathan, H. (2011) 'Chinese sentiment analysis using maximum entropy', in Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011), pp.89-93.
  9. Li, H., Zhang, J. and Zong, C. (2017) 'Implicit discourse relation recognition for English and Chinese with multiview modeling and effective representation learning', ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Vol. 16, Nos. 3-19.
  10. Samira lagrini, Nabiha Azizi, Mohammed Regjimi, and Monther Al Dwairi, "Toward an automatic summarisation of Arabic text depending on rhetorical relations", International journal of reasoning - based intelligent systems, Vol.11, No.3, 2019, pp. 203-214. https://doi.org/10.1504/ijris.2019.102533
  11. Christina Lioma, Birger larsen, Wei Lu, " Rhetorical relations for information retrieval", 35th International ACM SIGIR conference on research and development in information retrieval, USA, August 12-16, 2012.
  12. Regragui, Yassir & Abouenour, Lahsen & Krieche, Fettoum & Bouzoubaa, Karim & Rosso, Paolo. (2016). Arabic WordNet: New Content and New Applications. W. C. Mann and S. A. Thompson. Rhetorical structure theory: Toward a functional theory of text organization. Text, 8:243-281, 1988.
  13. B. Heerschop, F. Goossen, A. Hogenboom, F. Frasincar, U. Kaymak, and F. de Jong. Polarity analysis of texts using discourse structure. In Proceedings of the 20th ACM international conference on Information and knowledge management, CIKM '11, pages 1061-1070, New York, NY, USA, 2011.
  14. K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422-446, 2002. https://doi.org/10.1145/582415.582418
  15. P. Kingsbury and M. Palmer. From treebank to propbank. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), pages, 2002.
  16. B. A. Shawar, ''A Chatbot as a natural Web Interface to Arabic Web QA,''Int. J. Emerg. Technol. Learn. (iJET), vol. 6, no. 1, pp. 37-43, 2011. https://doi.org/10.3991/ijet.v6i1.1502
  17. M. F. Al-Jouie and A. M. Azmi, ''Automated evaluation of school children essays in Arabic,'' Procedia Comput. Sci., vol. 117, pp. 19-22, 2017. https://doi.org/10.1016/j.procs.2017.10.089
  18. H. Rababah and A. T. Al-Taani, ''An automated scoring approach for Arabic short answers essay questions,'' in Proc. 8th Int. Conf. Inf. Technol. (ICIT), May 2017, pp. 697-702.
  19. W. H. Gomaa and A. A. Fahmy, ''Automatic scoring for answers to Arabic test questions,'' Comput. Speech Lang., vol. 28, no. 4, pp. 833-857,Jul. 2014. https://doi.org/10.1016/j.csl.2013.10.005
  20. Al-Ayyoub Mahmoud, Nuseir Aya , Alsmearat Khouloud, Jaraweh Yaser and Gupta Brij, 2018. Deep learning for Arabic NLP.Journal of computational science 2018, volume 26.
  21. Mallek Fatma, Belainine Billal and Fatiha Sadat, 2017. Arabic social Media Analysis and Translation.3rd International conference on Arabic Computational Linguistics, ACLing 2017. Dubai,united Arab Emirates.
  22. Karaoui Jihen, Zitoune Benamara Farah and Moriceau Veronique,2017. SOUKHRIYA: Towards an Irony Detection System for Arabic in Social Media. 3rd International conference on Arabic computational Linguistics, ACling 2017. Dubai, Unitd Arab Emirates.
  23. Luqman Hamzah and Mahmoud Sabri, 2018. Automatic Translation of Arabic text-to Arabic sign language. Universal access in the information society.
  24. D. Jurafsky and J. H. Martin, Speech & Language Processing. London, U.K.: Pearson, 2017.
  25. S. K. Ray and K. Shaalan, ''A review and future perspectives of Arabic question answering systems,'' IEEE Trans. Knowl. Data Eng., vol. 28, no. 12, pp. 3169-3190, Dec. 2016. https://doi.org/10.1109/TKDE.2016.2607201
  26. A. Mishra and S. K. Jain, ''A survey on question answering systems with classification,'' J. King Saud Univ.-Comput. Inf. Sci., vol. 28, no. 3, pp. 345-361, Jul. 2016.
  27. M. Biltawi, A. Awajan, and S. Tedmori, ''Evaluation of question classification,'' in Proc. 2nd Int. Conf. New Trends Comput. Sci. (ICTCS), Oct. 2019, pp. 1-7.
  28. Y. H. Phuong and L. G. T. Nguyen, ''English teachers'questions in a vietnamese high school reading classroom,'' JEELS (J. English Educ. Linguistics Stud.), vol. 4, no. 2, pp. 129-154, 2018. https://doi.org/10.30762/jeels.v4i2.353