DOI QR코드

DOI QR Code

Performance comparison on vocal cords disordered voice discrimination via machine learning methods

기계학습에 의한 후두 장애음성 식별기의 성능 비교

  • Cheolwoo, Jo (School of Electrical, Electronics and Control Engineering, Changwon National University) ;
  • Soo-Geun, Wang (Department of Otolaryngology, Pusan National University Hospital) ;
  • Ickhwan, Kwon (Department of Applied IT and Engineering, Pusan National University Hospital)
  • 조철우 (창원대학교 전기전자제어공학부) ;
  • 왕수건 (부산대학교 의과대학 이비인후과) ;
  • 권익환 (부산대학교 대학원 IT응용공학과)
  • Received : 2022.08.01
  • Accepted : 2022.11.04
  • Published : 2022.12.31

Abstract

This paper studies how to improve the identification rate of laryngeal disability speech data by convolutional neural network (CNN) and machine learning ensemble learning methods. In general, the number of laryngeal dysfunction speech data is small, so even if identifiers are constructed by statistical methods, the phenomenon caused by overfitting depending on the training method can lead to a decrease the identification rate when exposed to external data. In this work, we try to combine results derived from CNN models and machine learning models with various accuracy in a multi-voting manner to ensure improved classification efficiency compared to the original trained models. The Pusan National University Hospital (PNUH) dataset was used to train and validate algorithms. The dataset contains normal voice and voice data of benign and malignant tumors. In the experiment, an attempt was made to distinguish between normal and benign tumors and malignant tumors. As a result of the experiment, the random forest method was found to be the best ensemble method and showed an identification rate of 85%.

본 논문은 후두 장애음성 데이터의 식별률을 CNN과 기계학습 앙상블 학습 방법에 의해 개선하는 방법에 대한 연구이다. 일반적으로 후두 장애음성 데이터는 그 수가 적으므로 통계적 방법에 의해 식별기가 구성되더라도, 훈련 방식에 따라 과적합으로 인해 일어나는 현상으로 인해 외부 데이터에 노출될 시 식별률의 저하가 발생할 수 있다. 본 연구에서는 다양한 정확도를 갖도록 훈련된 CNN 모델과 기계학습 모델로부터 도출된 결과를 다중 투표 방식으로 결합하여 원래의 훈련된 모델에 비해 향상된 분류 효율을 갖도록 하는 방법과 함께, 기존의 기계학습 중 앙상블 방법을 적용해 보고 그 결과를 확인하였다. 알고리즘을 훈련하고 검증하기 위해 PNUH(Pusan National University Hospital) 데이터셋을 이용하였다. 데이터셋에는 정상음성과 양성종양 및 악성 종양의 음성 데이터가 포함되어 있다. 실험에서는 정상 및 양성 종양과 악성종양을 구분하는 시도를 하였다. 실험결과 random forest 방법이 가장 우수한 앙상블 방법으로 나타났으며 85%의 식별률을 보였다.

Keywords

Acknowledgement

이 논문은 창원대학교 2021-2022년도 창원대학교 자율연구과제 연구비 지원으로 수행된 연구 결과임.

References

  1. Aicha, A. B. (2018). Noninvasive detection of potentially precancerous lesions of vocal fold based on glottal wave signal and SVM approaches. Procedia Computer Science, 126, 586-595. https://doi.org/10.1016/j.procs.2018.07.293
  2. Al-Nasheri, A., Muhammad, G., Alsulaiman, M., Ali, Z., Mesallam, T. A., Farahat, M., Malki, K. H., ... Bencherif, M. A. (2017). An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. Journal of Voice, 31(1), 113.e9-113.e18.
  3. Bezdek, J. C., Keller, J., Krisnapuram, R., Pal, N. R. (2005). Fuzzy models and algorithms for pattern recognition and image processing. (pp. 442-490). New York, NY: Springer.
  4. Fang, S. H., Tsao, Y., Hsiao, M. J., Chen, J. Y., Lai, Y. H., Lin, F. C., & Wang, C. T. (2019). Detection of pathological voice using cepstrum vectors: A deep learning approach. Journal of Voice, 33(5), 634-641. https://doi.org/10.1016/j.jvoice.2018.02.003
  5. Hegde, S., Shetty, S., Rai, S., & Dodderi, T. (2019). A survey on machine learning approaches for automatic detection of voice disorders. Journal of Voice, 33(6), 947.e11-947.e33.
  6. Jeon, B. U., Kang, J. S., & Chung, K. (2021). AutoLM and CNN-based soft-voting ensemble classification model for road traffic emerging risk detection. Journal of Convergence for Information Technology, 11(7), 14-20.
  7. Jo, C., Kim, K., Kim, D., & Wang, S. (2001, September). Screening of pathological voice from ARS using neural networks. Proceedings of the Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) 2nd International Workshop (pp. 241-245).
  8. Florence, Italy. Jung, H., Choi, M. K., Kim, J., Kwon, S., & Jung, W. (2020). CNN-based weighted ensemble technique for ImageNet classification. IEMEK Journal of Embedded Systems and Applications, 15(4), 197-204. https://doi.org/10.14372/IEMEK.2020.15.4.197
  9. Kim, H. B., Jeon, J., Han, Y. J., Joo, Y. H., Lee, J., Lee, S., & Im, S. (2020). Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy. Journal of Clinical Medicine, 9(11), 3415.
  10. Ko, H., Ha, H., Cho, H., Seo, K., & Lee, J. (2019, May). Pneumonia detection with weighted voting ensemble of CNN models. Proceedings of the 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD) (pp. 306-310). Chengdu, China.
  11. Lee, J. Y. (2021). Experimental evaluation of deep learning methods for an intelligent pathological voice detection system using the Saarbruecken voice database. Applied Sciences, 11(15), 7149. https://doi.org/10.3390/app11157149
  12. Librosa. (2021). Librosa: Audio and music processing in Python. Retrieved from http://librosa.org/
  13. Liu, F., Liu, Y., & Sang, H. (2020). Multi-classifier decision-level fusion classification of workpiece surface defects based on a convolutional neural network. Symmetry, 12(5), 867. https://doi.org/10.3390/sym12050867
  14. Lv, X., Ming, D., Lu, T., Zhou, K., Wang, M., & Bao, H. (2018). A new method for region-based majority voting CNNs for very high resolution image classification. Remote Sensing, 10(12), 1946. https://doi.org/10.3390/rs10121946
  15. Massachusetts Eye and Ear Infirmary. (1994). Voice disorders database, version.1.03 (CD-ROM). Lincoln Park, NJ: Kay Elemetrics.
  16. Morvant, E., Habrard, A., & Ayache, S. (2014, August). Majority vote of diverse classifiers for late fusion. Proceedings of the Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (p. 20). Joensuu, Finland.
  17. Roy, S., Sayim, M. I., & Akhand, M. A. H. (2019, May). Pathological voice classification using deep learning. Proceedings of the 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT). Dhaka, Bangladesh.
  18. Ruta, D., & Gabrys, B. (2000). An overview of classifier fusion methods. Computing and Information Systems, 7(1), 1-10.
  19. Saarbruecken Voice Database. (2020). Saarbruecken Voice Database. Retrieved from http://www.stimmdatenbank.coli.uni-saarland.de/
  20. Saldanha, J. C., Ananthakrishna, T., & Pinto, R. (2014). Vocal fold pathology assessment using mel-frequency cepstral coefficients and linear predictive cepstral coefficients features. Journal of Medical Imaging and Health Informatics, 4(2), 168-173. https://doi.org/10.1166/jmihi.2014.1253
  21. Scikit learn. (2022). Ensemble methods. Retrieved from https://scikit-learn.org/stable/modules/ensemble.html
  22. Su, Y., Zhang, K., Wang, J., & Madani, K. (2019). Environment sound classification using a two-stream CNN based on decision-level fusion. Sensors, 19(7), 1733. https://doi.org/10.3390/s19071733
  23. Szmurlo, R., & Osowski, S. (2021, September). Deep CNN ensemble for recognition of face images. Proceedings of the 2021 22nd International Conference on Computational Problems of Electrical Engineering (CPEE) (pp. 1-4). Hradek u Susice, Czech Republic.
  24. Tensorflow. (2021). Retrieved from http://www.tensorflow.org/
  25. Wu, H., Soraghan, J., Lowit, A., & Di Caterina, G. (2018, July). Convolutional neural networks for pathological voice detection. Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 1-4). Honolulu, HI.