DOI QR코드

DOI QR Code

Artificial Intelligence for Clinical Research in Voice Disease

후두음성 질환에 대한 인공지능 연구

  • Jungirl, Seok (Department of Otorhinolaryngology-Head and Neck Surgery, National Cancer Center) ;
  • Tack-Kyun, Kwon (Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University College of Medicine)
  • 석준걸 (국립암센터 이비인후과) ;
  • 권택균 (서울대학교 의과대학 이비인후과학교실)
  • Received : 2022.10.11
  • Accepted : 2022.11.15
  • Published : 2022.12.31

Abstract

Diagnosis using voice is non-invasive and can be implemented through various voice recording devices; therefore, it can be used as a screening or diagnostic assistant tool for laryngeal voice disease to help clinicians. The development of artificial intelligence algorithms, such as machine learning, led by the latest deep learning technology, began with a binary classification that distinguishes normal and pathological voices; consequently, it has contributed in improving the accuracy of multi-classification to classify various types of pathological voices. However, no conclusions that can be applied in the clinical field have yet been achieved. Most studies on pathological speech classification using speech have used the continuous short vowel /ah/, which is relatively easier than using continuous or running speech. However, continuous speech has the potential to derive more accurate results as additional information can be obtained from the change in the voice signal over time. In this review, explanations of terms related to artificial intelligence research, and the latest trends in machine learning and deep learning algorithms are reviewed; furthermore, the latest research results and limitations are introduced to provide future directions for researchers.

Keywords

Acknowledgement

This study was supported by a grant from the National Cancer Center in Korea (No. NCC2212460-1).

References

  1. Schwartz SR, Cohen SM, Dailey SH, Rosenfeld RM, Deutsch ES, Gillespie MB, et al. Clinical practice guideline: Hoarseness (dysphonia). Otolaryngol Head Neck Surg 2009;141(1_suppl):1-31.
  2. Pyo HY, Song Y. Recent trends in evaluation and diagnosis of voice disorders: A literature review. Commun Sci Disord 2010;15(4):506-25.
  3. Kim GH, Kwon SB. Auditory-perceptual and acoustic assessment in measuring dysphonia severity of vocal fold nodules. Journal of the Korea Contents Association 2018;18(1):108-16. https://doi.org/10.5392/JKCA.2018.18.01.108
  4. Pruszewicz A, Obrebowski A, Swidzinski P, Demenko G, Wika T, Wojciechowska A. Usefulness of acoustic studies on the differential diagnostics of organic and functional dysphonia. Acta Otolaryngol 1991;111(2):414-9. https://doi.org/10.3109/00016489109137412
  5. Hirano M, Hibi S, Yoshida T, Hirade Y, Kasuya H, Kikuchi Y. Acoustic analysis of pathological voice. Some results of clinical application. Acta Otolaryngol 1988;105(5-6):432-8. https://doi.org/10.3109/00016488809119497
  6. Kim H, Jeon J, Han YJ, Joo Y, Lee J, Lee S, et al. Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy. J Clin Med 2020;9(11):3415.
  7. Singh S, Xu W. Robust detection of Parkinsons disease using harvested smartphone voice data: A telemedicine approach. Telemed J E Health 2020;26(3):327-34. https://doi.org/10.1089/tmj.2018.0271
  8. Duffy JR, Werven GW, Aronson AE. Telemedicine and the diagnosis of speech and language disorders. Mayo Clin Proc 1997;72(12):1116-22. https://doi.org/10.4065/72.12.1116
  9. Hemmerling D, Wojcik-Pedziwiatr M. Prediction and estimation of Parkinsons disease severity based on voice signal. J Voice 2022;36(3): 439.e9-20.
  10. Wu H, Soraghan J, Lowit A, Di Caterina G. A deep learning method for pathological voice detection using convolutional deep belief networks. Proceedings of the Interspeech 2018; 2018 Sep 2-6; Hyderabad, India: Interspeech;2018.
  11. Mittal V, Sharma R. Deep learning approach for voice pathology detection and classification. Int J Healthc Inf Syst Inform 2021;16(4):1-30.
  12. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of go without human knowledge. Nature 2017;550(7676):354-9. https://doi.org/10.1038/nature24270
  13. Park HJ. Trend analysis of korea papers in the fields of 'artificial intelligence', 'machine learning' and 'deep learning'. J Korea Inst Inf Commun Eng 2020;13(4):283-92.
  14. Park SH. Artificial intelligence in medicine: Beginner's guide. J Korean Soc Radiol 2018;78(5):301-8. https://doi.org/10.3348/jksr.2018.78.5.301
  15. Jakhar D, Kaur I. Artificial intelligence, machine learning and deep learning: Definitions and differences. Clin Exp Dermatol 2020;45(1): 131-2. https://doi.org/10.1111/ced.14029
  16. Hu HC, Chang SY, Wang CH, Li KJ, Cho HY, Chen YT, et al. Deep learning application for vocal fold disease prediction through voice recognition: Preliminary development study. J Med Internet Res 2021; 23(6):e25247.
  17. Zhan A, Mohan S, Tarolli C, Schneider RB, Adams JL, Sharma S, et al. Using smartphones and machine learning to quantify Parkinson disease severity: The mobile Parkinson disease score. JAMA Neurol 2018;75(7):876-80. https://doi.org/10.1001/jamaneurol.2018.0809
  18. Morales MR. Multimodal depression detection: An investigation of features and fusion techniques for automated systems. New York: City University of New York;2018.
  19. Ozkanca YS, Demiroglu C, Besirli A, Celik S. Multi-lingual depression-level assessment from conversational speech using acoustic and text features. Proceedings of the Interspeech 2018; 2018 Sep 2-6; Hyderabad, India: Interspeech;2018.
  20. Rusz J, Tykalova T, Novotny M, Zogala D, Sonka K, Ruzicka E, et al. Defining speech subtypes in de novo Parkinson disease: Response to long-term levodopa therapy. Neurology 2021;97(21):e2124-35. https://doi.org/10.1212/WNL.0000000000012878
  21. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 1943;5(4):115-33. https://doi.org/10.1007/BF02478259
  22. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, et al. Backpropagation applied to handwritten zip code recognition. Neural Comput 1989;1(4):541-51. https://doi.org/10.1162/neco.1989.1.4.541
  23. Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Comput 2006;18(7):1527-54. https://doi.org/10.1162/neco.2006.18.7.1527
  24. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 2014;15(1):1929-58.
  25. Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, et al. Recent advances in convolutional neural networks. Pattern Recognit 2018;77: 354-77. https://doi.org/10.1016/j.patcog.2017.10.013
  26. Mikolov T, Karafiat M, Burget L, Cernocky J, Khudanpur S. Recurrent neural network based language model. Proceedings of the 11th Annual Conference of the International Speech Communication Association 2010 (Interspeech 2010); 2010 Sep 26-30; Chiba, Japan: ISCA;2010. p.1045-8.
  27. Zabidi A, Yassin I, Hassan H, Ismail N, Hamzah M, Rizman Z, et al. Detection of asphyxia in infants using deep learning convolutional neural network (CNN) trained on Mel frequency cepstrum coefficient (MFCC) features extracted from cry sounds. J Fundam Appl Sci 2017;9(3S):768-78. https://doi.org/10.4314/jfas.v9i1s.730
  28. Syed SA, Rashid M, Hussain S, Zahid H. Comparative analysis of CNN and RNN for voice pathology detection. Biomed Res Int 2021; 2021:6635964.
  29. Hung CH, Wang SS, Wang CT, Fang SH. Using SincNet for learning pathological voice disorders. Sensors (Basel) 2022;22(17):6634.
  30. Fujimura S, Kojima T, Okanoue Y, Shoji K, Inoue M, Omori K, et al. Classification of voice disorders using a one-dimensional convolutional neural network. J Voice 2022;36(1):15-20. https://doi.org/10.1016/j.jvoice.2020.02.009
  31. Woldert-Jokisz B. Saarbruecken voice database. Saarbruecken: Institute for Phonetics, Saarland University;2007.
  32. Elemetrics K. Kay elemetrics corp. Disordered voice database. Model 4337 (Ver. 1.03). Boston, MA: Kay Elemetrics Corp.;1994.
  33. Mesallam TA, Farahat M, Malki KH, Alsulaiman M, Ali Z, Al-Nasheri A, et al. Development of the arabic voice pathology database and its evaluation by using speech features and machine learning algorithms. J Healthc Eng 2017;2017:8783751.
  34. Syed SA, Rashid M, Hussain S. Meta-analysis of voice disorders databases and applied machine learning techniques. Math Biosci Eng 2020;17(6):7958-79. https://doi.org/10.3934/mbe.2020404
  35. Fang SH, Tsao Y, Hsiao MJ, Chen JY, Lai YH, Lin FC, et al. Detection of pathological voice using cepstrum vectors: A deep learning approach. J Voice 2019;33(5):634-41. https://doi.org/10.1016/j.jvoice.2018.02.003
  36. Lee JH, Lee CY, Eom JS, Pak M, Jeong HS, Son HY. Predictions for three-month postoperative vocal recovery after thyroid surgery from spectrograms with deep neural network. Sensors (Basel) 2022;22(17):6387.
  37. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019;17(1):195.
  38. Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: A systematic review and meta-analysis. Lancet Digit Healt 2019;1(6):e271-97. https://doi.org/10.1016/s2589-7500(19)30123-2
  39. Vieira FG, Venugopalan S, Premasiri AS, McNally M, Jansen A, McCloskey K, et al. A machine-learning based objective measure for ALS disease severity. NPJ Digit Med 2022;5(1):45.
  40. Mulfari D, Meoni G, Marini M, Fanucci L. Machine learning assistive application for users with speech disorders. Appl Soft Comput 2021;103:107147.
  41. Mulfari D, La Placa D, Rovito C, Celesti A, Villari M. Deep learning applications in telerehabilitation speech therapy scenarios. Comput Biol Med 2022;148:105864.
  42. Suppakitjanusant P, Sungkanuparph S, Wongsinin T, Virapongsiri S, Kasemkosin N, Chailurkit L, et al. Identifying individuals with recent COVID-19 through voice classification using deep learning. Sci Rep 2021;11(1):19149.