• 제목/요약/키워드: Hindi

검색결과 16건 처리시간 0.023초

한국인의 힌디어 폐쇄음 인식 (Korean Speakers' Perception of Hindi Stop Consonants)

  • 안현기
    • 말소리와 음성과학
    • /
    • 제1권3호
    • /
    • pp.57-63
    • /
    • 2009
  • The two specific research questions pursued in this paper are: (i) how Korean speakers perceive Hindi stops in terms of the three laryngeal categories of Korean stops; (ii) how well Korean speakers do with an ABX perception test that utilizes a total of 52 Hindi minimal pairs where all sounds are identical except for the laryngeal features of a stop in each word. A total of 45 university students participated in this experiment. The results showed that (i) Koreans tended to perceive Hindi voiceless unaspirated stops as Korean fortis ones, voiceless aspirated stops as aspirated ones, voiced stops as lenis ones, and breathy stops as aspirated ones, and (ii) Koreans had difficulty in distinguishing between voiceless aspirated and breathy stops in Hindi.

  • PDF

Hindi Correspondence of Bengali Nominal Suffixes

  • Chatterji, Sanjay
    • Journal of Multimedia Information System
    • /
    • 제8권4호
    • /
    • pp.221-232
    • /
    • 2021
  • One bottleneck of Bengali to Hindi transfer based machine translation system is the translation of suffixes of noun. The appropriate translation of a nominal suffix often depends on the semantic role of the corresponding noun chunk in the sentence. With the availability of a high performance Bengali morphological analyzer and a basic Bengali parser it is possible to identify the role of each noun chunk. This information may be used for building rules for translating the ambiguous nominal suffixes. As there are some similarities between the uses of Bengali and Hindi nominal suffixes we find that the rules may be identified by linguistically analyzing corpus data. In this paper, we identify rules for the ambiguous four Bengali nominal suffixes from corpus data and evaluate their performances. This set of rules is able to resolve a majority of the nominal suffix ambiguities in Bengali to Hindi transfer based machine translation system. Using the rules, we are able to translate 98.17% Bengali nouns correctly which is much better than the baseline ILMT system's accuracy of 62.8%.

Part-of-speech Tagging for Hindi Corpus in Poor Resource Scenario

  • Modi, Deepa;Nain, Neeta;Nehra, Maninder
    • Journal of Multimedia Information System
    • /
    • 제5권3호
    • /
    • pp.147-154
    • /
    • 2018
  • Natural language processing (NLP) is an emerging research area in which we study how machines can be used to perceive and alter the text written in natural languages. We can perform different tasks on natural languages by analyzing them through various annotational tasks like parsing, chunking, part-of-speech tagging and lexical analysis etc. These annotational tasks depend on morphological structure of a particular natural language. The focus of this work is part-of-speech tagging (POS tagging) on Hindi language. Part-of-speech tagging also known as grammatical tagging is a process of assigning different grammatical categories to each word of a given text. These grammatical categories can be noun, verb, time, date, number etc. Hindi is the most widely used and official language of India. It is also among the top five most spoken languages of the world. For English and other languages, a diverse range of POS taggers are available, but these POS taggers can not be applied on the Hindi language as Hindi is one of the most morphologically rich language. Furthermore there is a significant difference between the morphological structures of these languages. Thus in this work, a POS tagger system is presented for the Hindi language. For Hindi POS tagging a hybrid approach is presented in this paper which combines "Probability-based and Rule-based" approaches. For known word tagging a Unigram model of probability class is used, whereas for tagging unknown words various lexical and contextual features are used. Various finite state machine automata are constructed for demonstrating different rules and then regular expressions are used to implement these rules. A tagset is also prepared for this task, which contains 29 standard part-of-speech tags. The tagset also includes two unique tags, i.e., date tag and time tag. These date and time tags support all possible formats. Regular expressions are used to implement all pattern based tags like time, date, number and special symbols. The aim of the presented approach is to increase the correctness of an automatic Hindi POS tagging while bounding the requirement of a large human-made corpus. This hybrid approach uses a probability-based model to increase automatic tagging and a rule-based model to bound the requirement of an already trained corpus. This approach is based on very small labeled training set (around 9,000 words) and yields 96.54% of best precision and 95.08% of average precision. The approach also yields best accuracy of 91.39% and an average accuracy of 88.15%.

Optical Character Recognition for Hindi Language Using a Neural-network Approach

  • Yadav, Divakar;Sanchez-Cuadrado, Sonia;Morato, Jorge
    • Journal of Information Processing Systems
    • /
    • 제9권1호
    • /
    • pp.117-140
    • /
    • 2013
  • Hindi is the most widely spoken language in India, with more than 300 million speakers. As there is no separation between the characters of texts written in Hindi as there is in English, the Optical Character Recognition (OCR) systems developed for the Hindi language carry a very poor recognition rate. In this paper we propose an OCR for printed Hindi text in Devanagari script, using Artificial Neural Network (ANN), which improves its efficiency. One of the major reasons for the poor recognition rate is error in character segmentation. The presence of touching characters in the scanned documents further complicates the segmentation process, creating a major problem when designing an effective character segmentation technique. Preprocessing, character segmentation, feature extraction, and finally, classification and recognition are the major steps which are followed by a general OCR. The preprocessing tasks considered in the paper are conversion of gray scaled images to binary images, image rectification, and segmentation of the document's textual contents into paragraphs, lines, words, and then at the level of basic symbols. The basic symbols, obtained as the fundamental unit from the segmentation process, are recognized by the neural classifier. In this work, three feature extraction techniques-: histogram of projection based on mean distance, histogram of projection based on pixel value, and vertical zero crossing, have been used to improve the rate of recognition. These feature extraction techniques are powerful enough to extract features of even distorted characters/symbols. For development of the neural classifier, a back-propagation neural network with two hidden layers is used. The classifier is trained and tested for printed Hindi texts. A performance of approximately 90% correct recognition rate is achieved.

Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry

  • Prakash, Amit;Singh, Niraj Kumar;Saha, Sujan Kumar
    • ETRI Journal
    • /
    • 제44권3호
    • /
    • pp.413-425
    • /
    • 2022
  • The study of literary texts is one of the earliest disciplines practiced around the globe. Poetry is artistic writing in which words are carefully chosen and arranged for their meaning, sound, and rhythm. Poetry usually has a broad and profound sense that makes it difficult to be interpreted even by humans. The essence of poetry is Rasa, which signifies mood or emotion. In this paper, we propose a poetry classification-based approach to automatically extract similar poems from a repository. Specifically, we perform a novel Rasa-based classification of Hindi poetry. For the task, we primarily used lexical features in a bag-of-words model trained using the support vector machine classifier. In the model, we employed Hindi WordNet, Latent Semantic Indexing, and Word2Vec-based neural word embedding. To extract the rich feature vectors, we prepared a repository containing 37 717 poems collected from various sources. We evaluated the performance of the system on a manually constructed dataset containing 945 Hindi poems. Experimental results demonstrated that the proposed model attained satisfactory performance.

Hindi version of short form of douleur neuropathique 4 (S-DN4) questionnaire for assessment of neuropathic pain component: a cross-cultural validation study

  • Gudala, Kapil;Ghai, Babita;Bansal, Dipika
    • The Korean Journal of Pain
    • /
    • 제30권3호
    • /
    • pp.197-206
    • /
    • 2017
  • Background: Pain with neuropathic characteristics is generally more severe and associated with a lower quality of life compared to nociceptive pain (NcP). Short form of the Douleur Neuropathique en 4 Questions (S-DN4) is one of the most used and reliable screening questionnaires and is reported to have good diagnostic properties. This study was aimed to cross-culturally validate the Hindi version of the S-DN4 in patients with various chronic pain conditions. Methods: The S-DN4 is already translated into the Hindi language by Mapi Research Trust. This study assessed the psychometric properties of the Hindi version of the S-DN4 including internal consistency and test-retest reliability after 3 days' post-baseline assessment. Diagnostic performance was also assessed. Results: One hundred sixty patients with chronic pain, 80 each in the neuropathic pain (NeP) present and NeP absent groups, were recruited. Patients with NeP present reported significantly higher S-DN4 scores in comparison to patients in the NeP absent group (mean (SD), 4.7 (1.7) vs. 1.8 (1.6), P < 0.01). The S-DN4 was found to have an AUC of 0.88 with adequate internal consistency (Cronbach's ${\alpha}=0.80$) and a test-retest reliability (ICC = 0.92) with an optimal cut-off value of 3 (Youden's index = 0.66, sensitivity and specificity of 88.7% and 77.5%). The diagnostic concordance rate between clinician diagnosis and the S-DN4 questionnaire was 83.1% (kappa = 0.66). Conclusions: Overall, the Hindi version of the S-DN4 has good internal consistency and test-retest reliability along with good diagnostic accuracy.

An Artificial Intelligence Approach for Word Semantic Similarity Measure of Hindi Language

  • Younas, Farah;Nadir, Jumana;Usman, Muhammad;Khan, Muhammad Attique;Khan, Sajid Ali;Kadry, Seifedine;Nam, Yunyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권6호
    • /
    • pp.2049-2068
    • /
    • 2021
  • AI combined with NLP techniques has promoted the use of Virtual Assistants and have made people rely on them for many diverse uses. Conversational Agents are the most promising technique that assists computer users through their operation. An important challenge in developing Conversational Agents globally is transferring the groundbreaking expertise obtained in English to other languages. AI is making it possible to transfer this learning. There is a dire need to develop systems that understand secular languages. One such difficult language is Hindi, which is the fourth most spoken language in the world. Semantic similarity is an important part of Natural Language Processing, which involves applications such as ontology learning and information extraction, for developing conversational agents. Most of the research is concentrated on English and other European languages. This paper presents a Corpus-based word semantic similarity measure for Hindi. An experiment involving the translation of the English benchmark dataset to Hindi is performed, investigating the incorporation of the corpus, with human and machine similarity ratings. A significant correlation to the human intuition and the algorithm ratings has been calculated for analyzing the accuracy of the proposed similarity measures. The method can be adapted in various applications of word semantic similarity or module for any other language.

The Role of Contrast in Prosodically Induced Acoustic Variation

  • Choi, Han-Sook
    • 말소리와 음성과학
    • /
    • 제1권3호
    • /
    • pp.29-37
    • /
    • 2009
  • This paper presents results from speech production experiments on English, Korean, and Hindi that compare variation in the acoustic expression of dissimilar phonological laryngeal contrast in stops conditioned by prosodic prominence. Target stops are analyzed from utterance-initial, -medial, and -final positions, with a variation in contrastive focal accent, from the speech data by six male American English speakers, five male Seoul Korean speakers, and five male Delhi Hindi speakers. The results show that prosodic prominence conditions enhanced distinctiveness between contrastive segments in the three languages. The manner in which prosodic prominence and prosodic phrase structure is marked at the level of segmental variation is, however, found to be language-specific to some extent. In addition, a correlation between the size of the phonological inventory and the corresponding acoustic variation was found but the linear correlation was not strongly supported with the findings in the present study.

  • PDF

안드로이드에서 힌디어 텍스트 처리 방법 (A Text Processing Method for Devanagari Scripts in Andriod)

  • 김재혁;맹승렬
    • 한국콘텐츠학회논문지
    • /
    • 제11권12호
    • /
    • pp.560-569
    • /
    • 2011
  • 본 논문에서는 개방형 OS인 안드로이드에서 힌디어 텍스트 처리방법을 제안한다. 텍스트 처리의 핵심은 알파벳을 문자로 조합하는 규칙을 정의하는 오토마타와 폰트 파일에서 문자에 대응하는 이미지를 검색하고 이를 화면에 표시하는 폰트 렌더링이다. 오토마타는 입력 문자의 종류와 개수에 좌우되는데 유니코드를 기반으로 자음 14자와 모음 34자를 알파벳으로 사용하는 오토마타를 제안한다. 조합된 음절은 테이블 매핑 방식을 사용하여 그립 인덱스로 변환하고 해당하는 폰트를 로드하기 위한 핸들로 사용한다. 프리 타입 폰트엔진의 다국어 지원 프레임워크에 따라 제안방법을 별도의 모듈로 추가함으로서 시스템 수준에서 힌디어를 지원할 수 있다. 메시지 어플리케이션을 통해 제안방법의 타당성을 보인다.

탑의 원조 인도 스투파의 형태 해석 - 인도 전역의 현장 답사를 바탕으로 - (The Interpreggtation of the Indian Stupa as Origin of Korean Pagoda)

  • 이희봉
    • 건축역사연구
    • /
    • 제18권6호
    • /
    • pp.103-126
    • /
    • 2009
  • This study aims to discover historical trends and change of form of all stupas in India with observation of field study that is as direct as possible, by classifying, analyzing, and synthesizing the stupas. Study of Indian stupa in Korea has a number of shortcomings since only introductory partial approach has been made in order to seek the origin of Korean pagoda. This study also aims to correct errors of stupa terminology in Chinese character committed by misinterpretation of Hindi language which was established by precedent Japanese scholars several decades ago. Piled-up stupas were totally destroyed by pagans, therefore their remains tell us only of structure, material, sizeand disposition. However remains of carved stone at torana and drum give us clues as to the original form of stupa and worshipping activity, as well as change to a more luxurious form. Many rock cave stupas of India show us both simple forms matching the ascetic age of early Buddhism and luxurious changes in Mahayanan era introducing us to statues of Buddha. Indians recovered the spheric form of 'anda,' a Hindi term meaning cosmic egg, from the hemispheric form of the piled-up stupa. Therefore we might discard the erratic term of 'bokbal', which means an upset vessel. Railings and parasols became main factors of stupa design. Carved railings around stupa became a sign of divinity. Serious worshipping activity made drums long or high and created multi-embossed stripes. Bases of circular drums of some cave stupas changed their shapes to rectangular or octagonal. Single parasols became multiparasols of affluent flowerlike curved stems on carved stupa. Multistoried, elongated and high parasols of Gandhara stupas are closely related to such factors as diverse changes of form in Indian subcontinent. Four-sided torana gate and ayaka column of the circular form of original stupas suggest the rectangular form of subsequent East Asian pagoda, and higher and wider base of Indian stupas became the origin of East Asian rectangular pagoda.

  • PDF