Automatic proficiency assessment of Korean speech read aloud by non-natives using bidirectional LSTM-based speech recognition

Oh, Yoo Rhee;Park, Kiyoung;Jeon, Hyung-Bae;Park, Jeon Gue;

doi:10.4218/etrij.2019-0400

ETRI Journal

Volume 42 Issue 5
/
Pages.761-772
/
2020
/
1225-6463(pISSN)
/
2233-7326(eISSN)

Electronics and Telecommunications Research Institute (한국전자통신연구원)

DOI QR Code

Automatic proficiency assessment of Korean speech read aloud by non-natives using bidirectional LSTM-based speech recognition

Oh, Yoo Rhee (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute) ;
Park, Kiyoung (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute) ;
Jeon, Hyung-Bae (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute) ;
Park, Jeon Gue (Artificial Intelligence Research Laboratory, Electronics and Telecommunications Research Institute)

Received : 2019.08.22
Accepted : 2019.12.23
Published : 2020.11.16

https://doi.org/10.4218/etrij.2019-0400 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper presents an automatic proficiency assessment method for a non-native Korean read utterance using bidirectional long short-term memory (BLSTM)-based acoustic models (AMs) and speech data augmentation techniques. Specifically, the proposed method considers two scenarios, with and without prompted text. The proposed method with the prompted text performs (a) a speech feature extraction step, (b) a forced-alignment step using a native AM and non-native AM, and (c) a linear regression-based proficiency scoring step for the five proficiency scores. Meanwhile, the proposed method without the prompted text additionally performs Korean speech recognition and a subword un-segmentation for the missing text. The experimental results indicate that the proposed method with prompted text improves the performance for all scores when compared to a method employing conventional AMs. In addition, the proposed method without the prompted text has a fluency score performance comparable to that of the method with prompted text.

Keywords

References

M. Eskenazi, An overview of spoken language technology for education, Speech Commun. 51 (2009), no. 10, 832-844. https://doi.org/10.1016/j.specom.2009.04.005
J. Kannan and P. Munday, New trends in second language learning and teaching through the lens of ICT, networked learning, and artificial intelligence, Círculo de Linguística Aplicada a la Comunicacion 76 (2018), 13-30. https://doi.org/10.5209/CLAC.62495
Y. Kim, The Rising East Asian 'Wave': Korean Media Go Global, D.K. Thussu (ed.), Media on the Move: Global flow and contra-flow, Routledge, 2007, pp. 233-277.
D. Van Compernolle, Recognizing speech of goats, wolves, sheep and … non-natives, Speech Commun. 35 (2001), no. 1-2, 71-79. https://doi.org/10.1016/S0167-6393(00)00096-0
E. Y. Oh, Developmental research on an interactive application through speech recognition technology for foreign language speaking practice in (in Korean), Ph.D. thesis,Seoul National University, Aug. 2017.
L. Neumeyer et al., Automatic text-independent pronunciation scoring of foreign language student speech, in Proc. Int. Conf. Spoken Language Process. (Philadelphia, PA, USA), Oct. 1996, pp. 1457-1460.
C. Cucchiarini, H. Strik, and L. Boves, Automatic evaluation of Dutch pronunciation by using speech recognition technology, in Proc. of IEEE Workshop Autom. Speech Recogn. Understanding (Santa Barbara, CA, USA), Dec. 1997, pp. 622-629.
L. Neumeyer et al., Automatic scoring of pronunciation quality, Speech Commun. 30 (2000), no. 2, 83-93. https://doi.org/10.1016/S0167-6393(99)00046-1
H. Franco, L. Ferrer, and H. Bratt, Adaptive and discriminative modeling for improved mispronunciation detection, in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (Florence, Italy), May 2014, pp. 7709-7713.
A. Lee and J. Glass, Mispronunciation detection without nonnative training data, in Proc. Annu. Conf. Int. Speech Commun. Association (Dresden, Germany), Sept. 2015, pp. 643-647.
G. Huang et al., A evaluating model of English pronunciation for Chinese students, in Proc. IEEE Int. Conf. Commun. Softw. Netw. (Guangzhou, China), May 2017, pp. 1062-1065.
Y. Xiao and F. K. Soong, Proficiency Assessment of ESL Learner's Sentence Prosody with TTS Synthesized Voice as Reference, in Proc. Annu. Conf. Int. Speech Commun. Association (Stockholm, Sweden), Aug. 2017, pp. 1755-1759.
R. Duan et al., Transfer Learning based Non-native Acoustic Modeling for Pronunciation Error Detection, in Proc. ISCA Workshop Speech Language Technol. Education (Stockholm, Sweden), June 2017, pp. 42-46.
K. Zechner, I.I. Bejar, and R. Hemat, Toward an understanding of the role of speech recognition in Nonnative speech assessment, Tech. Report RR-07-02, ETS Research, June 2007.
Y. Wang et al., Towards automatic assessment of spontaneous spoken english, Speech Commun. 104 (2018), 47-56. https://doi.org/10.1016/j.specom.2018.09.002
Y. Xiao, F.K. Soong, and H. Wenping, Paired Phone-Posteriors Approach to ESL Pronunciation Quality Assessment, in Proc. Annu. Conf. Int. Speech Commun. Association (Hyderabad, India), Sept. 2018, pp. 1631-1635.
L. Chen et al., Automated scoring of nonnative speech using the SpeechRaterSM v. 5.0 Engine, Tech. Report RR-18-10, ETS Research, Apr. 2018.
S.H. Yang, H. Ryu, and M. Chung, A corpus-based analysis of Korean segments produced by Chinese learners, in Proc. Asia- Pacific Signal Inf. Process. Association Annu. Summit Conf. (Hong Kong, China), Dec. 2015, pp. 583-586.
H. Hong, S. Kim, and M. Chung, A Corpus-Based Analysis of Korean Segments Produced by Japanese Learners, in Proc. ISCA Workshop Speech Language Technol. Education (Grenoble, France), Aug. 2013, pp. 189-192.
S.-H. Yang, M. Na, and M. Chung, Modeling pronunciation variations for non-native speech recognition of Korean produced by Chinese learners, in Proc. ISCA Workshop Speech Language Technol. Education (leipzig, Germany), Sept. 2015, pp. 95-99.
S.-H. Yang and M. Chung, Linguistic Factors Affecting Evaluation of L2 Korean Speech Proficiency, in Proc. ISCA Workshop Speech Language Technol. Education (Stockholm, Sweden), June 2017, pp. 53-58.
S.-H. Yang and M. Chung, Assessment of Korean Spontaneous Speech Produced by Non-Native Learners: Issues and Methodology, in Proc, Oriental Int, Committee Co-ordination Standardisation Speech Databases Assessment (Miyazaki, Japan), May 2018.
S.-H. Yang and M. Chung, Automatic Proficiency Assessment for Korean Spoken by Chinese Learners, in Proc. Seoul Int. Conf. Speech Sci. (Seoul, Rep. of Korea), Nov. 2017, pp. 72-73.
H. Ryu et al., Automatic pronunciation assessment of Korean spoken by L2 learners using best feature set selection, in Proc. Asia-Pacific Signal Inf. Process. Association Annu. Summit Conf. (Jeju, Rep. of Korea), Dec. 2016, pp. 1-6.
S.-H. Yang and M. Chung, Self-imitating feedback generation using GAN for computer-assisted pronunciation training, Computing Research Repository (CoRR), (2019), abs/1904.09407.
T. Ko et al., Audio Augmentation for Speech Recognition, in Proc. Annu. Conf. Int. Speech Commun. Association (Dresden, Germany), Sept. 2015, pp. 3586-3589.
D. S. Park et al., SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition, in Proc. Annu. Conf. Int. Speech Commun. Association (Graz, Austria), Sept. 2019, pp. 1-7.
E. Chung and J. G. Park, Sentence-chain based Seq2seq model for corpus expansion, ETRI J. 39 (2017), no. 4, 455-466. https://doi.org/10.4218/etrij.17.0116.0074
H. Jiang, Confidence measures for speech recognition: a survey, Speech Commun. 45 (2005), no. 4, 455-470. https://doi.org/10.1016/j.specom.2004.12.004
D. Povey et al., The Kaldi Speech Recognition Toolkit, Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Dec. 2011.
SoX, Audio manipulation tool, (accessed March 12, 2020). [Online], available at: http://sox.sourceforge.net/.
X. Xi et al., Automated scoring of spontaneous speech using speechratersm V1.0, Tech. Report RR-08-62, ETS Research, Aug. 2008.
T.-Y. Jang, Speech rhythm metrics for automatic scoring of english speech by Korean EFL learners, Malsori (Speech Sounds) 1 (2008), 66, 41-59.
R Core Team, A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2014.
A. Stolcke, SRILM - An extensible language modeling toolkit, in Proc. Int. Conf. Spoken Language Process, (Denver, CO, USA), Sept, 2002, pp. 901-904.
H.-B. Jeon and S.-Y. Lee, Language model adaptation based on topic probability of latent dirichlet allocation, ETRI J. 38 (2016), no. 3, 487-493. https://doi.org/10.4218/etrij.16.0115.0499
S. Young et al., The HTK Book Version 3.4, Cambridge University Engineering Department, 2006.
S. J. Lee et al., Intra- and inter-frame features for automatic speech recognition, ETRI J. 36 (2014), 514-517. https://doi.org/10.4218/etrij.14.0213.0181
Wikipedia contributors, Revised Romanization of Korean - Wikipedia, the free encyclopedia. 2020. [Online].

Cited by

Speech Recognition for Task Domains with Sparse Matched Training Data vol.10, pp.18, 2020, https://doi.org/10.3390/app10186155
Research on Business English Translation Architecture Based on Artificial Intelligence Speech Recognition and Edge Computing vol.2021, 2020, https://doi.org/10.1155/2021/5518868
Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech vol.11, pp.6, 2020, https://doi.org/10.3390/app11062642
CitiusSynapse: A Deep Learning Framework for Embedded Systems vol.11, pp.23, 2020, https://doi.org/10.3390/app112311570

ETRI Journal

Automatic proficiency assessment of Korean speech read aloud by non-natives using bidirectional LSTM-based speech recognition

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)