DOI QR코드

DOI QR Code

HMM-based missing feature reconstruction for robust speech recognition in additive noise environments

가산잡음환경에서 강인음성인식을 위한 은닉 마르코프 모델 기반 손실 특징 복원

  • Received : 2014.11.04
  • Accepted : 2014.12.13
  • Published : 2014.12.31

Abstract

This paper describes a robust speech recognition technique by reconstructing spectral components mismatched with a training environment. Although the cluster-based reconstruction method can compensate the unreliable components from reliable components in the same spectral vector by assuming an independent, identically distributed Gaussian-mixture process of training spectral vectors, the presented method exploits the temporal dependency of speech to reconstruct the components by introducing a hidden-Markov-model prior which incorporates an internal state transition plausible for an observed spectral vector sequence. The experimental results indicate that the described method can provide temporally consistent reconstruction and further improve recognition performance on average compared to the conventional method.

Keywords

References

  1. Acero, A. (1990). Acoustic and Environmental Robustness in Automatic Speech Recognition, PhD. thesis, Dept. of Electrical and Computer Engineering, Carnegie Mellon University, PA.
  2. Raj, B. & Stern, R. M. (2005). Missing feature approaches in speech recognition, IEEE Signal Processing Magazine, vol. 22, 101-116.
  3. Peinado, A. M., Sanchez, V., Segura, J. C., & Perez-Cordoba, J. L. (2001). MMSE-based Channel Mitigation for Distributed Speech Recognition, Proc. EUROSPEECH, 2707-2710
  4. Peinado, A. M., Sanchez, V., Perez-Cordoba, J. L., Segura, J. C., & Rubio, J. (2002). HMM-Based Methods for Channel Error Mitigation in Distributed Speech Recognition, Proc. ICSLP02, 2205-2208.
  5. Borgstrom, B. J. & Alwan A. (2010). HMM-based reconstruction of unreliable spectrographic data for noise robust speech recognition, IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, 1612-1623. https://doi.org/10.1109/TASL.2009.2038811
  6. Huang, X., Acero, A., & Hon, H.-W. (2001). Spoken language processing: a guide to theory, algorithm, and system development, NJ: Prentice-Hall.
  7. Cho, J.-W. & Park, H.-M. (2013). An efficient HMM-based feature enhancement method with filter esimation for reverberant speech recognition, IEEE Signal Processing Letter, vol. 20, 1199-1202. https://doi.org/10.1109/LSP.2013.2283585
  8. Price, P., Fisher, W.M., Bernstein, J., Pallet, D.S.(1988). The DARPA 1000-Word Resource Management Database for Continuous Speech Recognition, Proc. IEEE ICASSP, 651-654
  9. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., & Woodland, P. (2006). The HTK book, Cambridge, UK: Cambridge University Press.
  10. Varga, A., Steeneken, H.J. (1993) Assessment for automatic speech recognition: 2. In: NOISEX 1992: A Database and anExpeiment to Study the Effect of Additive Noise on Speech Recognition Systems. Speech Comm., vol. 12, 247-251. https://doi.org/10.1016/0167-6393(93)90095-3
  11. Sound Jay. www.soundjay.com.