DOI QR코드

DOI QR Code

Sound Model Generation using Most Frequent Model Search for Recognizing Animal Vocalization

최대 빈도모델 탐색을 이용한 동물소리 인식용 소리모델생성

  • Ko, Youjung (Department of Computer Engineering, Hanbat National University) ;
  • Kim, Yoonjoong (Department of Computer Engineering, Hanbat National University)
  • Received : 2017.01.31
  • Accepted : 2017.02.04
  • Published : 2017.02.28

Abstract

In this paper, I proposed a sound model generation and a most frequent model search algorithm for recognizing animal vocalization. The sound model generation algorithm generates a optimal set of models through repeating processes such as the training process, the Viterbi Search process, and the most frequent model search process while adjusting HMM(Hidden Markov Model) structure to improve global recognition rate. The most frequent model search algorithm searches the list of models produced by Viterbi Search Algorithm for the most frequent model and makes it be the final decision of recognition process. It is implemented using MFCC(Mel Frequency Cepstral Coefficient) for the sound feature, HMM for the model, and C# programming language. To evaluate the algorithm, a set of animal sounds for 27 species were prepared and the experiment showed that the sound model generation algorithm generates 27 HMM models with 97.29 percent of recognition rate.

본 논문에서는 동물소리 인식시스템을 위하여 최대 빈도모델 탐색 알고리즘을 고안하고 이를 이용한 소리모델을 생성하는 방법을 제안하였다. 소리모델 생성 방법은 동물종의 소리 데이터로부터 학습과정, 비터비 탐색과정 및 최대 빈도모델 탐색과정을 반복하면서 HMM(Hidden Makcov Model)모델의 구조(상태의 수와 GMM의 수)를 탐색하여 최적의 인식률을 갖는 모델집합이 생성하는 방법이다. 최대 빈도모델 탐색 알고리즘은 입력 소리 데이터를 비터비(Viterbi) 알고리즘으로 탐색하여 모델리스트를 생성하고 이 리스트 중에서 최대 빈도수의 모델을 탐색하여 최종 인식결과로 결정하는 방법이다. 알고리즘에서 소리특징으로 MFCC(Mel Frequency Cepstral Coefficient), 모델형식으로 HMM을 이용하고 C# 프로그래밍언어로 구현 하였다. 알고리즘의 성능을 평가하기 위하여 27종의 동물소리를 선정하고 실험을 하였으며 27개의 HMM 모델집합이 97.29 퍼센트의 인식률로 생성됨을 확인하였다.

Keywords

References

  1. C. Lee, Y. Lee, Z. Ren, "Automatic Recognition of Bird Songs Using Cepstral Coefficients", Journal of Information Technology and Applications Vol. 1 No. 1, May, pp.17-23, 2006
  2. D. Mane, Rashmi R.A., S. L. Tade, "Identification & Detection System for Animals from their Vocalization", International Journal of Advanced Computer Research, vol. 3. pp.352 - 357. 2013
  3. D. Mitrovic and M. Zeppelzauer, "Discrimination and retrieval of animal sounds," IEEE Multimedia Modelling Conference, 2006.
  4. G. G. and Z. Li., "Content-based classification and retrieval by support vector machines," IEEE Transactions on Neural Networks, vol. 14, pp. 29 - 215, 2003.
  5. H. Chen, C. Huang, Y. Chen, C. Chen, and S. Chien, "An Intelligent Nocturnal Animal Vocalization Recognition System", International Journal of Computer and Communication Engineering, Vol. 4, No. 1, pp.39 - 45, 2015 https://doi.org/10.7763/IJCCE.2015.V4.379
  6. Chou, C,. and Liu, P,. (2009). "Bird Species Recognition by Wavelet Transformation of a Section of Birdsong", Proceeding Of symposia and workshop on ubiquitous, Autonomies and Trusted Computing. pp 189-193.
  7. Z. Le-Qing, "Insect sound recognition based on MFCC and PNN", 2011 International Conference on Multimedia and Signal Processing, pp. 42-46, 2011
  8. L. Rabiner and B. H. Juang. Fundamentals of speech recognition. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993.
  9. I. S. Hong, Y. J. Ko, H. S. Shin, Y. J. Kim, "Emotion Recognition from Korean Language using MFCC, HMM, and Speech Speed", The 12th International Conference on Multimedia Information Technology and Applications(MITA2016), pp.12-15, 2016
  10. Hidden Markov Model Toolkit, http://htk.eng.cam.ac.uk/. (accessed Jan., 10, 2017)
  11. S. Young, etal, "The HTK Book (for HTK Version 3.4)",Cambridge University Engineering Department, 2009
  12. MFCC(Mel-Frequency Cepstral Coefficients) Algorithm, https://en.wikipedia.org/wiki/Mel-frequency_cepstrum, (accessed Jan., 26,2017)
  13. Baum-Welch Algorithm, https://en.wikipedia.org/wiki/Baum-Welch_algorithm, (accessed Jan., 26,2017)
  14. Viterbi Algorithm, https://en.wikipedia.org/wiki/Viterbi_algorithm, (accessed Jan., 26,2017)