DOI QR코드

DOI QR Code

Relevant Image Retrieval of Korean Documents based on Sentence and Word Importance

문장 및 단어 중요도를 통한 한국어 문서 연관 이미지 검색

  • Kim, Nam-Gyu (Division of Computer & Information Engineering, Daegu University) ;
  • Kang, Shin-Jae (Division of Computer & Information Engineering, Daegu University)
  • 김남규 (대구대학교 컴퓨터정보공학부) ;
  • 강신재 (대구대학교 컴퓨터정보공학부)
  • Received : 2018.12.05
  • Accepted : 2019.03.08
  • Published : 2019.03.31

Abstract

While reading text-only documents and finding unknown words, readers will become the focus disturbed and not be able to understand the content of the documents. Because children have little experience, it is difficult to understand correctly if the description in context is unfamiliar or ambiguous. In this paper, in order to help understand the text and increase the interest of the readers, we analyze the texts of documents and select the contents that are considered important, and implement a system that displays the most relevant images automatically from the web and links the texts and the images together. The implementation of the system divides the article into paragraphs, analyzes the text, selects important sentences for each paragraph and the important words that best represent the meaning of the important sentences, searches for images related to the words on the web, and then links the images to each of the previous paragraphs. Experiments have shown how to select important sentences and how to select important words in the sentences. As a result of the experiment, we could get 60% performance by evaluating the accuracy of the relation between three selected images and corresponding important sentences.

텍스트로만 이루어진 글에서 알지 못하는 단어가 나온다면, 글을 읽는 도중 집중이 되지 않고 내용을 이해함에 있어 어려움이 생긴다. 또한 이미 알고 있는 단어라도 아이들의 경우 경험이 적기 때문에 글에서 상황을 묘사하는 표현이 생소하거나 애매하다면 머릿속에 떠올리기 힘들다. 이에 본 논문에서는 글을 이해를 돕고 독자의 흥미를 증가시키기 위해서 글의 텍스트들을 분석하여 중요하다고 판단되는 내용을 선택하고, 이 내용과 가장 관련 있는 이미지를 웹에서 자동으로 가져와 연결하여 보여주는 시스템을 구현하고자 한다. 시스템의 구현은 글을 문단 단위로 나누어 글을 분석하고, 문단마다 중요한 문장을 선택한 후, 중요한 문장 내에서 이 문장을 가장 잘 표현할 수 있는 중요한 단어들을 선택하여 웹에서 연관 이미지를 검색하고, 검색된 이미지 결과를 이전에 나눈 각 문단마다 연결시켜준다. 실험으로 글에서 중요한 문장을 선택하는 방법과 문장 내 중요한 단어를 선택하는 방법을 제시하였다. 실험한 결과, 선택된 이미지 3개와 해당 중요 문장과의 연관 여부를 정확률로 평가하였을 때 60%의 성능을 얻을 수 있었다.

Keywords

SHGSCZ_2019_v20n3_43_f0001.png 이미지

Fig. 1. Overall Process for Image Retrieval System

SHGSCZ_2019_v20n3_43_f0002.png 이미지

Fig. 2. Calculation of the importance of sentences and words through various algorithms

SHGSCZ_2019_v20n3_43_f0003.png 이미지

Fig. 3. Example of system results

Table 1. Image accuracy by word and sentence selection algorithm

SHGSCZ_2019_v20n3_43_t0001.png 이미지

References

  1. R. Mihalcea, P. Tarau, "TextRank: Bringing Order into Texts", Proc. of EMNLP-04and the 2004 Conference on Empirical Methods in Natural Language Processing, July, 2004. Available From: https://web.eecs.umich.edu/-mihalcea/papers/mihalcea.emnlp04.pdf
  2. J. P. Hong, J. W. Cha, "A Korean Important Sentence Extraction using TextRank Algorithms", Proc. of Korean Institute Of Information Scientists and Engineers, Vol.36, No.1C, pp. 311-314, 2009. Available From: http://www.dbpia.co.kr/Journal/ArticleDetail/NODE01219123#
  3. Y. S. Cho, K. H. Ahan, S. K. Kim, "A Study on Key Sentence Extraction by using Enriched TextRank", Proc. of KIIT Summer Conference, 5, pp. 39-44, 2013. Available From: http://www.dbpia.co.kr/Journal/ArticleDetail/NODE02172622
  4. C. D. Manning, P. Raghavan, H. Schutze, Introduction to Information Retrieval, pp. 100-123, Cambridge University Press, 2008. DOI: https://doi.org/10.1017/CBO9780511809071.007
  5. E. D. You, G. H. Choi, S. H. Kim, "Study on Extraction of Keywords Using TF-IDF and Text Structure of Novels", Journal of Korean Institute Of Information Scientists and Engineers, Vol.20, No.2, pp. 121-129, 2015. DOI: https://doi.org/10.9708/jksci.2015.20.2.121
  6. H. M. Yoo, H. J. Kim, J. Y. Chang, "Automatic Construction of Reduced Dimensional Cluster-based Keyword Association Networks using LSI", Journal of Korean Institute Of Information Scientists and Engineers, Vol.44, No.11, pp. 1236-1243, 2017. DOI: https://doi.org/10.5626/jok.2017.44.11.1236
  7. J. U. Heu, Y. D. Joo, D. H. Lee, "Multi-Document Summarization Technique using Semantic Analysis between Tags", Journal of Korean Institute Of Information Scientists and Engineers, Vol.39, No.1, pp. 78-88, 2012. Available From: http://www.koreascience.or.kr/article/JAKO201213956027474.page
  8. D. H. Kim, T. M. Cho, J. H. Lee, "Multi-document Summarization Using Context-based Sentence Expansion", Proc. of Korean Institute of Intelligent Systems, Vol.25, No.1, pp. 149-150, 2015. Available From: http://www.dbpia.co.kr/Journal/ArticleDetail/NODE06277145
  9. Y. D. Kwon, N. R. Kim, J. H. Lee, "Document Summarization Considering Entailment Relation between Sentences", Journal of Korean Institute Of Information Scientists and Engineers, Vol.44 No.2, pp. 179-185, 2017. DOI: https://doi.org/10.5626/jok.2017.44.2.179
  10. J. Hur, M. G. Jang, "Homonym Disambiguation using Sense-Tagged Compound Noun Dictionary", Proc. of KIIT fall Conference, Vol.32, No.2 pp. 538-540, 2005. Available From: http://www.dbpia.co.kr/Journal/ArticleDetail/NODE00951194
  11. C. H. Kwak, Y. H. Seo, C. H. Lee, "Efficient Part-of-Speech Set for Knowledge-based Word Sense Disambiguation of Korean Nouns", Journal of the Korea Contents Association, Vol.16, No.4, pp. 418-425, 2016. DOI: https://doi.org/10.5392/jkca.2016.16.04.418
  12. J. H. Lim, Y. J. Bae, H. K. Kim, Y. J. Kim, K. C. Lee, "Korean Dependency Guidelines for Dependency Parsing and Exo-Brain Language Analysis Corpus", Proc. of the 27th Annual Conference on Human and Cognitive Language Technology, pp. 234-239, 2015. Available From: http://ocean.kisti.re.kr/IS_mvpopo212L.do?ResultTotalCNT=55&pageNo=6&pageSize=10&method=list&acnCn1=&poid=sighlt&kojic=OOGHAK&sVnc=y2015m10a&id=&setId=&iTableId=&iDocId=&sFree=&pQuery=%28kojic%3AOOGHAK%29+AND+%28voliss_ctrl_no%3Ay2015m10a%29
  13. H. W. Jang, S. S. Cho, "Automatic Tagging for Social Images using Convolution Neural Networks", Journal of KIISE, Vol.43. NO.1. pp. 47-53, 2016. DOI: https://doi.org/10.5626/jok.2016.43.1.47
  14. Y. J. Ahn, D. Y. Hong, K. S. Shim, "Image Clustering with CNN and AAE", Proc. of the KISS conference, Vol.44, No.2. pp. 245-247, 2017. Available From: http://www.dbpia.co.kr/Journal/ArticleDetail/NODE07322113
  15. W. Meng, "A sentence-based image search engine", Masters Theses, Missouri university of science and technology, 2015. Available From: https://scholarsmine.mst.edu/cgi/viewcontent.cgi?article=8473&context=masters_theses
  16. Online place to share language information, National Korean Language Institute, Available From: https://ithub.korean.go.kr. (accessed May, 18, 2018)