DOI QR코드

DOI QR Code

Automatic Categorization of Islamic Jurisprudential Legal Questions using Hierarchical Deep Learning Text Classifier

  • AlSabban, Wesam H. (Department of Information Systems, Umm Al-Qura University) ;
  • Alotaibi, Saud S. (Department of Information Systems, Umm Al-Qura University) ;
  • Farag, Abdullah Tarek (Speakol) ;
  • Rakha, Omar Essam (Faculty of Engineering, Ain Shams University) ;
  • Al Sallab, Ahmad A. (Faculty of Engineering, Cairo University) ;
  • Alotaibi, Majid (Department of Computer Engineering, Umm Al-Qura University)
  • Received : 2021.09.05
  • Published : 2021.09.30

Abstract

The Islamic jurisprudential legal system represents an essential component of the Islamic religion, that governs many aspects of Muslims' daily lives. This creates many questions that require interpretations by qualified specialists, or Muftis according to the main sources of legislation in Islam. The Islamic jurisprudence is usually classified into branches, according to which the questions can be categorized and classified. Such categorization has many applications in automated question-answering systems, and in manual systems in routing the questions to a specialized Mufti to answer specific topics. In this work we tackle the problem of automatic categorisation of Islamic jurisprudential legal questions using deep learning techniques. In this paper, we build a hierarchical deep learning model that first extracts the question text features at two levels: word and sentence representation, followed by a text classifier that acts upon the question representation. To evaluate our model, we build and release the largest publicly available dataset of Islamic questions and answers, along with their topics, for 52 topic categories. We evaluate different state-of-the art deep learning models, both for word and sentence embeddings, comparing recurrent and transformer-based techniques, and performing extensive ablation studies to show the effect of each model choice. Our hierarchical model is based on pre-trained models, taking advantage of the recent advancement of transfer learning techniques, focused on Arabic language.

Keywords

Acknowledgement

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding their research work through the project number 20-UQU-IF-P3-001.

References

  1. B. Athiwaratkun, A. G. Wilson, and A. Anandkumar, "Probabilistic fasttext for multi-sense word embeddings," arXiv Prepr. arXiv1806.02901, 2018.
  2. A. B. Soliman, K. Eissa, and S. R. El-Beltagy, "Aravec: A set of arabic word embedding models for use in arabic nlp," Procedia Comput. Sci., vol. 117, pp. 256-265, 2017. https://doi.org/10.1016/j.procs.2017.10.117
  3. W. Antoun, F. Baly, and H. Hajj, "Arabert: Transformer-based model for arabic language understanding," arXiv Prepr. arXiv2003.00104, 2020.
  4. W. Antoun, F. Baly, and H. Hajj, "AraGPT2: Pre-Trained Transformer for Arabic Language Generation," Dec. 2020, Accessed: Jul. 05, 2021. [Online]. Available: http://arxiv.org/abs/2012.15520.
  5. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv Prepr. arXiv1810.04805, 2018.
  6. A. Vaswani, "Attention Is All You Need," no. Nips, 2017.
  7. J. Howard and S. Ruder, "Universal language model fine-tuning for text classification," arXiv Prepr. arXiv1801.06146, 2018.
  8. M. Djandji, F. Baly, H. Hajj, and others, "Multi-Task Learning using AraBert for Offensive Language Detection," in Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, 2020, pp. 97-101.
  9. A. M. Abu Nada, E. Alajrami, A. A. Al-Saqqa, and S. S. Abu-Naser, "Arabic Text Summarization Using AraBERT Model Using Extractive Text Summarization Approach," 2020.
  10. A. Al Sallab, M. Rashwan, H. Raafat, and A. Rafea, "Automatic Arabic diacritics restoration based on deep nets," in Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP), 2014, pp. 65-72.
  11. A. Al-sallab, R. Baly, H. Hajj, K. B. Shaban, W. Elhajj, and G. Badaro, "AROMA : A Recursive Deep Learning Model for Opinion Mining in Arabic as a Low Resource Language," vol. 16, no. 4, 2017.
  12. A. Magooda et al., "RDI-Team at SemEval-2016 task 3: RDI unsupervised framework for text ranking," 2016.
  13. N. A. P. Rostam and N. H. A. H. Malim, "Text categorisation in Quran and Hadith: Overcoming the interrelation challenges using machine learning and term weighting," J. King Saud Univ. - Comput. Inf. Sci., vol. 33, no. 6, pp. 658-667, Jul. 2019, doi: 10.1016/j.jksuci.2019.03.007.
  14. M. E. Peters et al., "Deep contextualized word representations," arXiv Prepr. arXiv1802.05365, 2018.
  15. D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv Prepr. arXiv1409.0473, 2014.
  16. T. B. Brown et al., "Language models are few-shot learners," arXiv Prepr. arXiv2005.14165, 2020.
  17. S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
  18. J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Gated feedback recurrent neural networks," arXiv Prepr. arXiv1502.02367, 2015.