Search | Korea Science

KR-WordRank : An Unsupervised Korean Word Extraction Method Based on WordRank (KR-WordRank : WordRank를 개선한 비지도학습 기반 한국어 단어 추출 방법)

Kim, Hyun-Joong;Cho, Sungzoon;Kang, Pilsung
- Journal of Korean Institute of Industrial Engineers
- /
- v.40 no.1
- /
- pp.18-33
- /
- 2014
A Word is the smallest unit for text analysis, and the premise behind most text-mining algorithms is that the words in given documents can be perfectly recognized. However, the newly coined words, spelling and spacing errors, and domain adaptation problems make it difficult to recognize words correctly. To make matters worse, obtaining a sufficient amount of training data that can be used in any situation is not only unrealistic but also inefficient. Therefore, an automatical word extraction method which does not require a training process is desperately needed. WordRank, the most widely used unsupervised word extraction algorithm for Chinese and Japanese, shows a poor word extraction performance in Korean due to different language structures. In this paper, we first discuss why WordRank has a poor performance in Korean, and propose a customized WordRank algorithm for Korean, named KR-WordRank, by considering its linguistic characteristics and by improving the robustness to noise in text documents. Experiment results show that the performance of KR-WordRank is significantly better than that of the original WordRank in Korean. In addition, it is found that not only can our proposed algorithm extract proper words but also identify candidate keywords for an effective document summarization.
https://doi.org/10.7232/JKIIE.2014.40.1.018 인용 PDF KSCI

Brainstorming using TextRank algorithms and Artificial Intelligence (TextRank 알고리즘 및 인공지능을 활용한 브레인스토밍)

Sang-Yeong Lee;Chang-Min Yoo;Gi-Beom Hong;Jun-Hyuk Oh;Il-young Moon
- Journal of Practical Engineering Education
- /
- v.15 no.2
- /
- pp.509-517
- /
- 2023
The reactive web service provides a related word recommendation system using the TextRank algorithm and a word-based idea generation service selected by the user. In the related word recommendation system, the method of weighting each word using the TextRank algorithm and the probability output method using SoftMax are discussed. The idea generation service discusses the idea generation method and the artificial intelligence reinforce-learning method using mini-GPT. The reactive web discusses the linkage process between React, Spring Boot, and Flask, and describes the overall operation method. When the user enters the desired topic, it provides the associated word. The user constructs a mind map by selecting a related word or adding a desired word. When a user selects a word to combine from a constructed mind-map, it provides newly generated ideas and related patents. This web service can share generated ideas with other users, and improves artificial intelligence by receiving user feedback as a horoscope.
https://doi.org/10.14702/JPEE.2023.509 인용 PDF

Text Categorization Using TextRank Algorithm (TextRank 알고리즘을 이용한 문서 범주화)

Bae, Won-Sik;Cha, Jeong-Won
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.1
- /
- pp.110-114
- /
- 2010
We describe a new method for text categorization using TextRank algorithm. Text categorization is a problem that over one pre-defined categories are assigned to a text document. TextRank algorithm is a graph-based ranking algorithm. If we consider that each word is a vertex, and co-occurrence of two adjacent words is a edge, we can get a graph from a document. After that, we find important words using TextRank algorithm from the graph and make feature which are pairs of words which are each important word and a word adjacent to the important word. We use classifiers: SVM, Na$\ddot{i}$ve Bayesian classifier, Maximum Entropy Model, and k-NN classifier. We use non-cross-posted version of 20 Newsgroups data set. In consequence, we had an improved performance in whole classifiers, and the result tells that is a possibility of TextRank algorithm in text categorization.
PDF KSCI

A Novel Ontology Matching Model to Address Ontology Heterogeneity

Hongzhou Duan;Yongju Lee
- International Journal of Internet, Broadcasting and Communication
- /
- v.17 no.1
- /
- pp.151-162
- /
- 2025
This study introduces a novel ontology matching model designed to address ontology heterogeneity by leveraging both textual and structural information within ontologies, alongside external data. The model employs a word embedding approach to refine word vectors for enhanced discrimination between semantically similar and associative descriptions. Additionally, it adopts BERT for generating dynamic word vectors, enabling the nuanced distinction of polysemous terms. Our model calculates structural similarity by transforming ontologies into graph structures and applying the SimRank algorithm to calculate the entities' structural similarity within these graphs. The matching process employs a stable matching algorithm to secure stable one-to-one correspondences, while one-to-many matches are determined through similarity thresholds and comparative analysis
https://doi.org/10.7236/IJIBC.2025.17.1.151 인용 PDF HTML

Proposal of keyword extraction method based on morphological analysis and PageRank in Tweeter (트위터에서 형태소 분석과 PageRank 기반 화제단어 추출 방법 제안)

Lee, Won-Hyung;Cho, Sung-Il;Kim, Dong-Hoi
- Journal of Digital Contents Society
- /
- v.19 no.1
- /
- pp.157-163
- /
- 2018
People who use SNS publish their diverse ideas on SNS every day. The data posted on the SNS contains many people's thoughts and opinions. In particular, popular keywords served on Twitter compile the number of frequently appearing words in user posts and rank them. However, this method is sensitive to unnecessary data simply by listing duplicate words. The proposed method determines the ranking based on the topic of the word using the relationship diagram between words, so that the influence of unnecessary data is less and the main word can be stably extracted. For the performance comparison in terms of the descending keyword rank and the ratios of meaningless keywords among high rank 20 keywords, we make a comparison between the proposed scheme which is based on morphological analysis and PageRank, and the existing scheme which is based on the number of appearances. As a result, the proposed scheme and the existing scheme have included 55% and 70% of meaningless keywords among high rank 20 keywords, respectively, where the proposed scheme is improved about 15% compared with the existing scheme.
https://doi.org/10.9728/dcs.2018.19.1.157 인용 PDF KSCI

A Study for Development of a Korean Pain Measurement Tool(II). A Study for Testing Ranks of Words in each Subclass of a Korean Pain Measurement Tool (동통 평가도구 개발을 위한 연구 -한국 통증 어휘별 강도 순위의 유의도 및 신뢰도 검사-)

이은옥;송미순
- Journal of Korean Academy of Nursing
- /
- v.13 no.3
- /
- pp.106-118
- /
- 1983
The main purpose of this study is to systematically classify words indicating pain in terms of their ranks in each subclass. This study is a part of developing a Korean Pain Measurement Tool. This study didnot include exploration of each word's dimension such as sensory or affective. Eighty three Korean words tentatively classified in 19 subclasses in previous study were used for this study. At least three to six words were included in each subclass and the words were randomly placed in which each subject indicates their rank of pain degree. One hundred and fifty nursing students and one hundred clinical nurses were requested to indicate the rank of each word. One hundred and sixteen students and eighty three nurses completed the ratings for analysis. The data were collected from June 1983 to July 1983. The data using ordinal scale were analyzed by Friedman ANOVA to test significant difference between rank means. All of pain words indicated significant rank mean difference in all of 19 subclasses. Some of the words were either cancelled or replaced by other words, or rearranged for their ranks. Subclasses of which words were cancelled were 1) Simple stimulating pain, 2) Punctuate pressure, 3) peripheral nerve pain, 4) radiation pain, 5) punishment-related pain, and 6) suffering-related pain. Subclasses of which words were replaced or rearranged were 1) incisive pressure, 2) constrictive pressure, 3) dull pain, 4) tract pain, 5) digestion-related pain and 6) fear-related pain. Four subclasses such as traction pressure, thermal, cavity pressure, and fatigue- elated pain indicated significant differences among rank means in each subclasses and showed no visible overlaps of the ranks among means. Further research is needed using high level measurement of pain degree of each word and more sophisticated analysis of the pain degrees. Three pain words which would be related to chemical stimulation were newly explored and included as a new subclass. Through this study, the total number of subclasses increases from 19 to 20 and the total number of Korean words in the scale decreases from 83 to 80.
PDF

Ranking Translation Word Selection Using a Bilingual Dictionary and WordNet

Kim, Kweon-Yang;Park, Se-Young
- Journal of the Korean Institute of Intelligent Systems
- /
- v.16 no.1
- /
- pp.124-129
- /
- 2006
This parer presents a method of ranking translation word selection for Korean verbs based on lexical knowledge contained in a bilingual Korean-English dictionary and WordNet that are easily obtainable knowledge resources. We focus on deciding which translation of the target word is the most appropriate using the measure of semantic relatedness through the 45 extended relations between possible translations of target word and some indicative clue words that play a role of predicate-arguments in source language text. In order to reduce the weight of application of possibly unwanted senses, we rank the possible word senses for each translation word by measuring semantic similarity between the translation word and its near synonyms. We report an average accuracy of $51\%$ with ten Korean ambiguous verbs. The evaluation suggests that our approach outperforms the default baseline performance and previous works.
https://doi.org/10.5391/JKIIS.2006.16.1.124 인용 PDF KSCI

Automatic Keyword Extraction using Hierarchical Graph Model Based on Word Co-occurrences (단어 동시출현관계로 구축한 계층적 그래프 모델을 활용한 자동 키워드 추출 방법)

Song, KwangHo;Kim, Yoo-Sung
- Journal of KIISE
- /
- v.44 no.5
- /
- pp.522-536
- /
- 2017
Keyword extraction can be utilized in text mining of massive documents for efficient extraction of subject or related words from the document. In this study, we proposed a hierarchical graph model based on the co-occurrence relationship, the intrinsic dependency relationship between words, and common sub-word in a single document. In addition, the enhanced TextRank algorithm that can reflect the influences of outgoing edges as well as those of incoming edges is proposed. Subsequently a novel keyword extraction scheme using the proposed hierarchical graph model and the enhanced TextRank algorithm is proposed to extract representative keywords from a single document. In the experiments, various evaluation methods were applied to the various subject documents in order to verify the accuracy and adaptability of the proposed scheme. As the results, the proposed scheme showed better performance than the previous schemes.
https://doi.org/10.5626/JOK.2017.44.5.522 인용 KSCI

A Reranking Method Using Query Expansion and PageRank Check (페이지 랭크지수와 질의 확장을 이용한 재랭킹 방법)

Kim, Tae-Hwan;Jeon, Ho-Chul;Choi, Joong-Min
- The KIPS Transactions:PartB
- /
- v.18B no.4
- /
- pp.231-240
- /
- 2011
Many search algorithms have been implemented by many researchers on the world wide web. One of the best algorithms is Google using PageRank technology. PageRank approach computes the number of inlink of each documents then ranks documents in the order of inlink members. But it is difficult to find the results that user needs, because this method find documents not valueable for a person but valueable for the public. To solve this problem, We use the WordNet for analysis of the user's query history. This paper proposes a personalized search engine using the user's query history and PageRank Check. We compared the performance of the proposed approaches with google search results in the top 30. As a result, the average of the r-precision for the proposed approaches is about 60% and it is better as about 14%.
https://doi.org/10.3745/KIPSTB.2011.18B.4.231 인용 PDF KSCI

An improved spectrum mapping applied to speaker adaptive Kroean word recognition

Matsumoto, Hiroshi;Lee, Yong-Ju;Kim, Hoi-Rim;Kido, Ken'iti
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06a
- /
- pp.1009-1014
- /
- 1994
This paper improves the previously proposed spectral mapping method for supervised speaker adaptation in which a mapped spectrum is interpolated from speaker difference vectors at typical spectra based on a minimized distortion criterion. In estimating these difference vectors, it is important to find an appropriate number of typical points. The previous method empirically adjusts the number of typical points, while the present method optimizes the effective number by rank reduction of normal equation. This algorithm was applied to a supervised speaker adaptation for Korean word recognition using the templates form a prototype male speaker. The result showed that the rank reduction technique not only can automatically determine an optimal number of code vectors, but also slightly improves the recognition scores compared with those obtained by the previous method.
PDF

Search Result 50, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)