• Title/Summary/Keyword: Universal Dependencies

Search Result 3, Processing Time 0.017 seconds

Universal POS Tagset for Korean (Universal POS 태그셋의 한국어 적용)

  • Park, Hye-Jin;Oh, Tae-Hwan;Kim, Han-Saem
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.417-421
    • /
    • 2018
  • The Universal Dependencies 프로젝트는 현재 71개 언어, 122개 Treebank로 이루어져 있으며, 병렬 언어 처리를 위해 여러 언어에 적용할 수 있는 형태적, 구문론적 특성을 찾는 것을 목표로 한다. 본고는 UD의 형태 태그셋인 Universal POS를 살펴보고, 한국어의 기존 형태 태그셋을 UPOS로 자동 변환하여 적용하는 방안을 제안한다. 영어와 같은 굴절어를 중심으로 구축된 UPOS 체계를 교착어에 속하는 한국어에 적용하기 위해서는 UPOS의 개별 표지와 21세기 세종계획 형태 주석 표지 결합체 간의 일대다 사상을 시도해야 한다.

  • PDF

Manual Revision of Penn Korean Universal Dependency Treebank (Penn Korean Universal Dependency Treebank 데이터셋 구축)

  • Oh, Taehwan;Han, Jiyoon;Kim, Hansaem
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.61-65
    • /
    • 2021
  • 본 연구에서는 2018년에 공개된 Penn Korean Universal Dependency Treebank(이하 PKT-UD v2018) 데이터의 오류를 분석하고 이를 개정하여 새롭게 데이터셋(이하 PKT-UD v2020)을 구축하였다. PKT-UD v2018은 구구조 분석 방식으로 구축된 Penn Korean Treebank를 UD(Universal Dependencies)의 체계에 맞추어 자동적으로 변환한 후 보정하여 구축한 데이터이다. 본 연구에서는 이와 같은 자동 변환의 과정에서 발생한 오류를 바로 잡고, UD 체계를 최대한 활용하면서 한국어의 특성을 잘 살린 데이터셋을 구축할 수 있는 방법을 제안하였다.

  • PDF

A Case Study on Universal Dependency Tagsets (다국어 범용 의존관계 주석체계(Universal Dependencies) 적용 연구 - 한국어와 일본어의 비교를 중심으로)

  • Han, Jiyoon;Lee, Jin;Lee, Chanyoung;Kim, Hansaem
    • Cross-Cultural Studies
    • /
    • v.53
    • /
    • pp.163-192
    • /
    • 2018
  • The purpose of this paper was to examine universal dependency UD application cases of Korean and Japanese with similar morphological characteristics. In addition, UD application and improvement methods of Korean were examined through comparative analysis. Korean and Japanese are very well developed due to their agglutinative characteristics. Therefore, there are many difficulties to apply UD which is built around English refraction. We examined the application of UPOS and DEPREL as components of UD with discussions. In UPOS, we looked at category problem related to narrative such as AUX, ADJ, and VERB, We examined how to handle units. In relation to the DEPREL annotation system, we discussed how to reflect syntactic problem from the basic unit annotation of syntax tags. We investigated problems of case and aux arising from the problem of setting dominant position from Korean and Japanese as the dominant language. We also investigated problems of annotation of parallel structure and setting of annotation basic unit. Among various relation annotation tags, case and aux are discussed because they show the most noticeable difference in distribution when comparing annotation tag application patterns with Korean. The case is related to both Korean and Japanese surveys. Aux is a secondary verb in Korean and an auxiliary verb in Japanese. As a result of examining specific annotation patterns, it was found that Japanese aux not only assigned auxiliary clauses, but also auxiliary elements to add the grammatical meaning to the verb and form corresponding to the end of Korean. In UD annotation of Japanese, the basic unit of morphological analysis is defined as a unit of basic syntactic annotation in Japanese UD annotation. Thus, when using information, it is necessary to consider how to use morphological analysis unit as information of dependency annotation in Korean.