• 제목/요약/키워드: vocal feature

검색결과 52건 처리시간 0.025초

성도 면적 함수를 이용한 음성 인식에 관한 연구 (A Study on Speech Recognition using Vocal Tract Area Function)

  • 송제혁;김동준
    • 대한의용생체공학회:의공학회지
    • /
    • 제16권3호
    • /
    • pp.345-352
    • /
    • 1995
  • The LPC cepstrum coefficients, which are an acoustic features of speech signal, have been widely used as the feature parameter for various speech recognition systems and showed good performance. The vocal tract area function is a kind of articulatory feature, which is related with the physiological mechanism of speech production. This paper proposes the vocal tract area function as an alternative feature parameter for speech recognition. The linear predictive analysis using Burg algorithm and the vector quantization are performed. Then, recognition experiments for 5 Korean vowels and 10 digits are executed using the conventional LPC cepstrum coefficients and the vocal tract area function. The recognitions using the area function showed the slightly better results than those using the conventional LPC cepstrum coefficients.

  • PDF

성도 면적 함수와 벡터 양자화를 이용한 음성 인식에 관한 연구 (A Study on Speech Recognition using Vocal Tract Area function and Vector Quantization)

  • 송제혁;김동준;박상희
    • 대한의용생체공학회:학술대회논문집
    • /
    • 대한의용생체공학회 1993년도 추계학술대회
    • /
    • pp.171-174
    • /
    • 1993
  • We propose the vocal tract area function as the feature vector of speech recognition. Vocal tract area function is directly related to speech production. The vocal tract area function is not only showing mechanism of speech production but also can be used as an effective feature vector in speech, recognition in this study.

  • PDF

Vocal Effort Detection Based on Spectral Information Entropy Feature and Model Fusion

  • Chao, Hao;Lu, Bao-Yun;Liu, Yong-Li;Zhi, Hui-Lai
    • Journal of Information Processing Systems
    • /
    • 제14권1호
    • /
    • pp.218-227
    • /
    • 2018
  • Vocal effort detection is important for both robust speech recognition and speaker recognition. In this paper, the spectral information entropy feature which contains more salient information regarding the vocal effort level is firstly proposed. Then, the model fusion method based on complementary model is presented to recognize vocal effort level. Experiments are conducted on isolated words test set, and the results show the spectral information entropy has the best performance among the three kinds of features. Meanwhile, the recognition accuracy of all vocal effort levels reaches 81.6%. Thus, potential of the proposed method is demonstrated.

음절말 자음 중화의 원인 (Why do Obstruents Neutralize in Syllable Final Position\ulcorner)

  • 양순임
    • 대한음성학회지:말소리
    • /
    • 제41호
    • /
    • pp.31-47
    • /
    • 2001
  • The purpose of this study is to explain the cause of obsturents neutralization in syllable final position. Most of the previous phonological studies did not reflect phonetic reality sufficiently because of the limited use of the binary feature system. Using binary distinctive features, we can't explain the cause of neutralization. In order to explain the cause of neutralization, I use the multi-valued phonetic feature -[vocal tract aperture]. By [vocal tract aperture] I mean the distance between articulators in the hold stage. In this study, I claim that the cause of neutralization is assimilation to [vocal tract aperture] 0 degree. The neutralized sounds become aplosives, as a consequence of assimilation to [vocal tract aperture].

  • PDF

성대 신호를 이용한 인식 시스템 (RECOGNITION SYSTEM USING VOCAL-CORD SIGNAL)

  • 조관현;한문성;박준석;정영규
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2005년도 학술대회 논문집 정보 및 제어부문
    • /
    • pp.216-218
    • /
    • 2005
  • This paper present a new approach to a noise robust recognizer for WPS interface. In noisy environments, performance of speech recognition is decreased rapidly. To solve this problem, We propose the recognition system using vocal-cord signal instead of speech. Vocal-cord signal has low quality but it is more robust to environment noise than speech signal. As a result, we obtained 75.21% accuracy using MFCC with CMS and 83.72% accuracy using ZCPA with RASTA.

  • PDF

양성 성대 병변의 비수술적 치료 (Non-Surgical Management for Benign Vocal Fold Lesions)

  • 이상혁
    • 대한후두음성언어의학회지
    • /
    • 제26권2호
    • /
    • pp.97-100
    • /
    • 2015
  • Benign vocal fold lesions, such as vocal nodules, polyps and Reinke's edema, usually result from chronic voice overuse. Conservative management such as voice therapy and pharmacotherapy are used as the primary treatment techniques. The main purpose of voice therapy is to identify and reduce voice misuse to achieve the optimal voice. But complete resolution may not be possible in all patients after voice therapy. Furthermore, some patients with voice-related occupations, voice rest and voice therapy are sometimes difficult, which makes it hard to carry out the treatment. When conservative therapy is ineffective, laryngeal microsurgery can be performed under general anesthesia. However, potential complications following laryngeal suspension and violation of the layered structure of the vocal fold during surgery should be considered before surgery. In recent decades, emerging literatures have demonstrated the potential usefulness of vocal fold steroid injection as an alternative treatment option for benign vocal fold lesions. The most advantageous feature of vocal fold steroid injection is the maintenance of regional anti-inflammatory effects while preventing the potential systemic adverse effects of the steroid. Many non-surgical treatment methods can be conducted using different approaches in the office setting. It can be applied as an alternative treatment modality for the management of various benign vocal fold lesions.

  • PDF

파워 스펙트럼 warping을 이용한 성도 정규화 (Vocal Tract Normalization Using The Power Spectrum Warping)

  • 유일수;김동주;노용완;홍광석
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2003년도 학술회의 논문집 정보 및 제어부문 A
    • /
    • pp.215-218
    • /
    • 2003
  • The method of vocal tract normalization has been known as a successful method for improving the accuracy of speech recognition. A frequency warping procedure based low complexity and maximum likelihood has been generally applied for vocal tract normalization. In this paper, we propose a new power spectrum warping procedure that can be improve on vocal tract normalization performance than a frequency warping procedure. A mechanism for implementing this method can be simply achieved by modifying the power spectrum of filter bank in Mel-frequency cepstrum feature(MFCC) analysis. Experimental study compared our Proposal method with the well-known frequency warping method. The results have shown that the power spectrum warping is better 50% about the recognition performance than the frequency warping.

  • PDF

청각장애자용 발음훈련기기 개발에 관한 연구 (A study on speech training aids for Deafs)

  • 안상필;이재혁;윤태성;박상희
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1990년도 하계학술대회 논문집
    • /
    • pp.47-50
    • /
    • 1990
  • Deafs cannot speak straight voice as normal people in lack of feedback of their pronunciation, therefore speech training is required. In this study, fundamental frequency, intensity, formant frequencies, vocal tract graphic and vocal tract area function, extracted from speech signal, are used as feature parameter. AR model, whose coefficients are extracted using inverse filtering. is used as speech generation model. In connect ion between vocal tract graphic and speech parameter, articulation distances and articulation distance functions in selected 15-intervals are determined by extracted vocal tract areas and formant frequencies.

  • PDF

Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks

  • You, Shingchern D.;Liu, Chien-Hung;Lin, Jia-Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권2호
    • /
    • pp.729-748
    • /
    • 2021
  • Vocal detection is one of the fundamental steps in musical information retrieval. Typically, the detection process consists of feature extraction and classification steps. Recently, neural networks are shown to outperform traditional classifiers. In this paper, we report our study on how to improve detection accuracy further by carefully choosing the parameters of the deep network model. Through experiments, we conclude that a feature-classifier model is still better than an end-to-end model. The recommended model uses a spectrogram as the input plane and the classifier is an 18-layer convolutional neural network (CNN). With this arrangement, when compared with existing literature, the proposed model improves the accuracy from 91.8% to 94.1% in Jamendo dataset. As the dataset has an accuracy of more than 90%, the improvement of 2.3% is difficult and valuable. If even higher accuracy is required, the ensemble learning may be used. The recommend setting is a majority vote with seven proposed models. Doing so, the accuracy increases by about 1.1% in Jamendo dataset.

음성 특성 및 음성 독립 변수의 사상체질 분류로의 적용 방법 (Application of Vocal Properties and Vocal Independent Features to Classifying Sasang Constitution)

  • 김근호;강남식;구본초;김종열
    • 사상체질의학회지
    • /
    • 제23권4호
    • /
    • pp.458-470
    • /
    • 2011
  • 1. Objectives Vocal characteristics are commonly considered as an important factor in determining the Sasang constitution and the health condition. We have tried to find out the classification procedure to distinguish the constitution objectively and quantitatively by analyzing the characteristics of subject's voice without noise and error. 2. Methods In this study, we extract the vocal features from voice selected with prior information, remove outliers, minimize the correlated features, correct the features with normalization according to gender and age, and make the discriminant functions that are adaptive to gender and age from the features for improving diagnostic accuracy. 3. Results and Conclusions Finally, the discriminant functions produced about 45% accuracy to classify the constitution for every age interval and every gender, and the diagnostic accuracy was meaningful as the result from only the voice.