DOI QR코드

DOI QR Code

Histogram Equalization Using Background Speakers' Utterances for Speaker Identification

화자 식별에서의 배경화자데이터를 이용한 히스토그램 등화 기법

  • Received : 2012.03.13
  • Accepted : 2012.06.19
  • Published : 2012.06.30

Abstract

In this paper, we propose a novel approach to improve histogram equalization for speaker identification. Our method collects all speech features of UBM training data to make a reference distribution. The ranks of the feature vectors are calculated in the sorted list of the collection of the UBM training data and the test data. We use the ranks to perform order-based histogram equalization. The proposed method improves the accuracy of the speaker recognition system with short utterances. We use four kinds of speech databases to evaluate the proposed speaker recognition system and compare the system with cepstral mean normalization (CMN), mean and variance normalization (MVN), and histogram equalization (HEQ). Our system reduced the relative error rate by 33.3% from the baseline system.

Keywords

References

  1. Atal. B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, Journal of the Acoustical Society of America, vol. 55, No. 6, 1304-1312. https://doi.org/10.1121/1.1914702
  2. Viikki. O. and Laurila. K. (1998). Cepstral domain segmental feature vector normalization for noise robust speech recognition, Speech Communication, Vol 25, 133-147. https://doi.org/10.1016/S0167-6393(98)00033-8
  3. Huang. X., Acero. A. and Hon. H. (2001). Spoken Language Language Processing: A Guide to Theory, Algorithm, and System Development, Upper Saddle River NJ: Prentice-Hall.
  4. Moreno. P. J., Raj. B. and Stern. R. M. (1996). A vector Taylor series approach for environment independent speech recognition, Proc. ICASSP, 733-736.
  5. Kim. N. S. (2008). Statistical linear approximation for environment compensation, IEEE Signal Processing Letters, Vol. 5, No. 1, 8-10.
  6. Gonzalez. R. C. and Wintz. P. (1987). Digital Image Processing, Reading MA: Addision-Wesley.
  7. Pelecanos. J. and Sridharan. S. (2001). Feature warping for robust speaker verification, A Speaker Odyssey - The speaker recognition workshop, 213-218.
  8. Skosan. M. and Mashao. D. (2004). Matching feature distributions for robust speaker verification, Proc. PRASA, 42-47.
  9. Skosan. M. and Mashao. D. (2006). Modified segmental histogram equalization for robust speaker verification, Pattern Recognition Letters, Vol 27, No. 5, 479-486. https://doi.org/10.1016/j.patrec.2005.09.009
  10. Segura. J. C., Benitez. C., A. Torre. de la, Rubio. A. J. and Ramirez. J. (2006). Cpestral domain segmental nonlinear feature transformations for robust speech recognition, IEEE Signal Processing Letters, Vol. 11, 517-520.
  11. Torre. A. de la, Peinado. A. M., Segura. J. C., Perez-Cordoba. J. L., Benitez. M. C. and Rubio. A. J. (2005). Histogram equalization of speech representation for robust speech recognition, IEEE Trans. Speech and Audio Processing, Vol. 13, 355-366. https://doi.org/10.1109/TSA.2005.845805
  12. Reynolds. D. A., Quatieri. T. F. and Dunn. R. B. (2000). Speaker verification using adapted gaussian mixture models, Digital Signal Processing, Vol. 10, 19-41. https://doi.org/10.1006/dspr.1999.0361

Cited by

  1. An Automatic Method of Detecting Audio Signal Tampering in Forensic Phonetics vol.6, pp.2, 2014, https://doi.org/10.13064/KSSS.2014.6.2.021