• Title/Summary/Keyword: vocal feature

Search Result 52, Processing Time 0.028 seconds

A Study on Speech Recognition using Vocal Tract Area Function (성도 면적 함수를 이용한 음성 인식에 관한 연구)

  • 송제혁;김동준
    • Journal of Biomedical Engineering Research
    • /
    • v.16 no.3
    • /
    • pp.345-352
    • /
    • 1995
  • The LPC cepstrum coefficients, which are an acoustic features of speech signal, have been widely used as the feature parameter for various speech recognition systems and showed good performance. The vocal tract area function is a kind of articulatory feature, which is related with the physiological mechanism of speech production. This paper proposes the vocal tract area function as an alternative feature parameter for speech recognition. The linear predictive analysis using Burg algorithm and the vector quantization are performed. Then, recognition experiments for 5 Korean vowels and 10 digits are executed using the conventional LPC cepstrum coefficients and the vocal tract area function. The recognitions using the area function showed the slightly better results than those using the conventional LPC cepstrum coefficients.

  • PDF

A Study on Speech Recognition using Vocal Tract Area function and Vector Quantization (성도 면적 함수와 벡터 양자화를 이용한 음성 인식에 관한 연구)

  • Song, Jei-Hyuck;Kim, Dong-Jun;Park, Sang-Hui
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1993 no.11
    • /
    • pp.171-174
    • /
    • 1993
  • We propose the vocal tract area function as the feature vector of speech recognition. Vocal tract area function is directly related to speech production. The vocal tract area function is not only showing mechanism of speech production but also can be used as an effective feature vector in speech, recognition in this study.

  • PDF

Vocal Effort Detection Based on Spectral Information Entropy Feature and Model Fusion

  • Chao, Hao;Lu, Bao-Yun;Liu, Yong-Li;Zhi, Hui-Lai
    • Journal of Information Processing Systems
    • /
    • v.14 no.1
    • /
    • pp.218-227
    • /
    • 2018
  • Vocal effort detection is important for both robust speech recognition and speaker recognition. In this paper, the spectral information entropy feature which contains more salient information regarding the vocal effort level is firstly proposed. Then, the model fusion method based on complementary model is presented to recognize vocal effort level. Experiments are conducted on isolated words test set, and the results show the spectral information entropy has the best performance among the three kinds of features. Meanwhile, the recognition accuracy of all vocal effort levels reaches 81.6%. Thus, potential of the proposed method is demonstrated.

Why do Obstruents Neutralize in Syllable Final Position\ulcorner (음절말 자음 중화의 원인)

  • Yang Sun-Im
    • MALSORI
    • /
    • no.41
    • /
    • pp.31-47
    • /
    • 2001
  • The purpose of this study is to explain the cause of obsturents neutralization in syllable final position. Most of the previous phonological studies did not reflect phonetic reality sufficiently because of the limited use of the binary feature system. Using binary distinctive features, we can't explain the cause of neutralization. In order to explain the cause of neutralization, I use the multi-valued phonetic feature -[vocal tract aperture]. By [vocal tract aperture] I mean the distance between articulators in the hold stage. In this study, I claim that the cause of neutralization is assimilation to [vocal tract aperture] 0 degree. The neutralized sounds become aplosives, as a consequence of assimilation to [vocal tract aperture].

  • PDF

RECOGNITION SYSTEM USING VOCAL-CORD SIGNAL (성대 신호를 이용한 인식 시스템)

  • Cho, Kwan-Hyun;Han, Mun-Sung;Park, Jun-Seok;Jeong, Young-Gyu
    • Proceedings of the KIEE Conference
    • /
    • 2005.10b
    • /
    • pp.216-218
    • /
    • 2005
  • This paper present a new approach to a noise robust recognizer for WPS interface. In noisy environments, performance of speech recognition is decreased rapidly. To solve this problem, We propose the recognition system using vocal-cord signal instead of speech. Vocal-cord signal has low quality but it is more robust to environment noise than speech signal. As a result, we obtained 75.21% accuracy using MFCC with CMS and 83.72% accuracy using ZCPA with RASTA.

  • PDF

Non-Surgical Management for Benign Vocal Fold Lesions (양성 성대 병변의 비수술적 치료)

  • Lee, Sang Hyuk
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.26 no.2
    • /
    • pp.97-100
    • /
    • 2015
  • Benign vocal fold lesions, such as vocal nodules, polyps and Reinke's edema, usually result from chronic voice overuse. Conservative management such as voice therapy and pharmacotherapy are used as the primary treatment techniques. The main purpose of voice therapy is to identify and reduce voice misuse to achieve the optimal voice. But complete resolution may not be possible in all patients after voice therapy. Furthermore, some patients with voice-related occupations, voice rest and voice therapy are sometimes difficult, which makes it hard to carry out the treatment. When conservative therapy is ineffective, laryngeal microsurgery can be performed under general anesthesia. However, potential complications following laryngeal suspension and violation of the layered structure of the vocal fold during surgery should be considered before surgery. In recent decades, emerging literatures have demonstrated the potential usefulness of vocal fold steroid injection as an alternative treatment option for benign vocal fold lesions. The most advantageous feature of vocal fold steroid injection is the maintenance of regional anti-inflammatory effects while preventing the potential systemic adverse effects of the steroid. Many non-surgical treatment methods can be conducted using different approaches in the office setting. It can be applied as an alternative treatment modality for the management of various benign vocal fold lesions.

  • PDF

Vocal Tract Normalization Using The Power Spectrum Warping (파워 스펙트럼 warping을 이용한 성도 정규화)

  • Yu, Il-Su;Kim, Dong-Ju;No, Yong-Wan;Hong, Gwang-Seok
    • Proceedings of the KIEE Conference
    • /
    • 2003.11b
    • /
    • pp.215-218
    • /
    • 2003
  • The method of vocal tract normalization has been known as a successful method for improving the accuracy of speech recognition. A frequency warping procedure based low complexity and maximum likelihood has been generally applied for vocal tract normalization. In this paper, we propose a new power spectrum warping procedure that can be improve on vocal tract normalization performance than a frequency warping procedure. A mechanism for implementing this method can be simply achieved by modifying the power spectrum of filter bank in Mel-frequency cepstrum feature(MFCC) analysis. Experimental study compared our Proposal method with the well-known frequency warping method. The results have shown that the power spectrum warping is better 50% about the recognition performance than the frequency warping.

  • PDF

A study on speech training aids for Deafs (청각장애자용 발음훈련기기 개발에 관한 연구)

  • Ahn, Sang-Pil;Lee, Jae-Hyuk;Yoon, Tae-Sung;Park, Sang-Hui
    • Proceedings of the KIEE Conference
    • /
    • 1990.07a
    • /
    • pp.47-50
    • /
    • 1990
  • Deafs cannot speak straight voice as normal people in lack of feedback of their pronunciation, therefore speech training is required. In this study, fundamental frequency, intensity, formant frequencies, vocal tract graphic and vocal tract area function, extracted from speech signal, are used as feature parameter. AR model, whose coefficients are extracted using inverse filtering. is used as speech generation model. In connect ion between vocal tract graphic and speech parameter, articulation distances and articulation distance functions in selected 15-intervals are determined by extracted vocal tract areas and formant frequencies.

  • PDF

Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks

  • You, Shingchern D.;Liu, Chien-Hung;Lin, Jia-Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.2
    • /
    • pp.729-748
    • /
    • 2021
  • Vocal detection is one of the fundamental steps in musical information retrieval. Typically, the detection process consists of feature extraction and classification steps. Recently, neural networks are shown to outperform traditional classifiers. In this paper, we report our study on how to improve detection accuracy further by carefully choosing the parameters of the deep network model. Through experiments, we conclude that a feature-classifier model is still better than an end-to-end model. The recommended model uses a spectrogram as the input plane and the classifier is an 18-layer convolutional neural network (CNN). With this arrangement, when compared with existing literature, the proposed model improves the accuracy from 91.8% to 94.1% in Jamendo dataset. As the dataset has an accuracy of more than 90%, the improvement of 2.3% is difficult and valuable. If even higher accuracy is required, the ensemble learning may be used. The recommend setting is a majority vote with seven proposed models. Doing so, the accuracy increases by about 1.1% in Jamendo dataset.

Application of Vocal Properties and Vocal Independent Features to Classifying Sasang Constitution (음성 특성 및 음성 독립 변수의 사상체질 분류로의 적용 방법)

  • Kim, Keun-Ho;Kang, Nam-Sik;Ku, Bon-Cho;Kim, Jong-Yeol
    • Journal of Sasang Constitutional Medicine
    • /
    • v.23 no.4
    • /
    • pp.458-470
    • /
    • 2011
  • 1. Objectives Vocal characteristics are commonly considered as an important factor in determining the Sasang constitution and the health condition. We have tried to find out the classification procedure to distinguish the constitution objectively and quantitatively by analyzing the characteristics of subject's voice without noise and error. 2. Methods In this study, we extract the vocal features from voice selected with prior information, remove outliers, minimize the correlated features, correct the features with normalization according to gender and age, and make the discriminant functions that are adaptive to gender and age from the features for improving diagnostic accuracy. 3. Results and Conclusions Finally, the discriminant functions produced about 45% accuracy to classify the constitution for every age interval and every gender, and the diagnostic accuracy was meaningful as the result from only the voice.