DOI QR코드

DOI QR Code

AI Performance Based On Learning-Data Labeling Accuracy

인공지능 학습데이터 라벨링 정확도에 따른 인공지능 성능

  • Ji-Hoon Lee (Dept. of Biomedical Informatics, College of Medicine, Konyang University) ;
  • Jieun Shin (Dept. of Biomedical Informatics, College of Medicine, Konyang University, The Head of HealthCareData Verification Center)
  • 이지훈 (건양대학교 의과대학 정보의학교실) ;
  • 신지은 (건양대학교 의과대학 정보의학교실, 건양대학교 병원 헬스케어데이터 검증센터)
  • Received : 2023.12.11
  • Accepted : 2024.01.20
  • Published : 2024.01.28

Abstract

The study investigates the impact of data quality on the performance of artificial intelligence (AI). To this end, the impact of labeling error levels on the performance of artificial intelligence was compared and analyzed through simulation, taking into account the similarity of data features and the imbalance of class composition. As a result, data with high similarity between characteristic variables were found to be more sensitive to labeling accuracy than data with low similarity between characteristic variables. It was observed that artificial intelligence accuracy tended to decrease rapidly as class imbalance increased. This will serve as the fundamental data for evaluating the quality criteria and conducting related research on artificial intelligence learning data.

본 연구는 데이터의 품질이 인공지능(AI) 성능에 미치는 영향을 검토한다. 이를 위해, 데이터 특성변수(Feature)의 유사도와 클래스(Class) 구성의 불균형을 고려한 모의실험(Simulation)을 통해 라벨링 오류 수준이 인공지능의 성능에 미치는 영향을 비교 분석하였다. 그 결과, 특성변수 간 유사성이 높은 데이터에서는 특성 변수 간 유사성이 낮은 데이터에 비해 라벨링 정확도에 더 민감하게 반응하였으며, 클래스 불균형이 증가함에 따라 인공지능 정확도가 급격히 감소되는 경향을 관찰하였다. 이는 인공지능 학습데이터의 품질평가 기준 및 관련 연구를 위한 기초자료가 될 것이다.

Keywords

References

  1. Dong-ah Park (2017). A Study on Conversational Public Administration Service of the Chatbot Based on Artificial Intelligence. Journal of Korea Multimedia Society, 20(8), 1347-1356. DOI : 10.9717/kmms.2017.20.8.1347
  2. Choi S.S. & Hong A.R (2021). Identifying Issue Changes of AI Chatbot 'Iruda' Case and Its Implications. Electronics and Telecommunications Trends, 36(2), 93-101. DOI : 10.22648/ETRI.2021.J.360210
  3. Yun Sangoh. (2018). Issues of Public Service Using Artificial Intelligence -Focused on ChatBot Service-. Korean Public Management Review, 32(2), 83-104. DOI : 10.24210/kapm.2018.32.2.004
  4. Jung Won-Sup. (2020). Discrimination and Bias of Artificial Intelligence. HUMAN BEINGS, ENVIRONMENT AND THEIR FUTURE, (25), 55-73. DOI: 10.34162/hefins.2020..25.003
  5. Jang Jun Hee & Seok-Joo Koh. (2020, November). A Study on the Policy Direction for Building AI Learning Data, Proceedings of Symposium of the Korean Institute of communications and Information Sciences. (pp. 305-306). Online : KICS
  6. Berrar, D. (2019). Cross-Validation. Encyclopedia of Bioinformatics and Computational Biology, 1, 542-545. DOI : 10.1016/B978-0-12-809633-8.20349-X
  7. Jongwook Yoon (2019). A Pilot Study on the Standardization of Machine Learning Dataset Construction Process. The Journal of Internet Electronic Commerce Resarch, 19(5), 199-217. DOI : 10.37272/JIECR.2019.10.19.5.199
  8. Kim, S. H. & Ryu, D. (2023, June). Method for improving video/image data quality for AI learning of unstructured data. Jouranl of Information and Security, 23(2), 55-66. DOI : 10.33778/kcsa.2023.23.2.055
  9. Soon-Jae Kim, Woo-Hyeok Son, Ji-Hye Lee, Hoa-Hung Nguyen & Han-You Jeong. (2023). Sampling-based Analysis of Labeling Errors in AI-Hub Traffic-Light Datasets. Journal of the Institute of Electronics and Information Engineers, 60(3), 109-112. DOI : 10.5573/ieie.2023.60.3.109
  10. Northcutt, C. G., Athalye, A., & Mueller, J. (2021, December). Pervasive label errors in test sets destabilize machine learning benchmarks. Conference on Neural Information Processing Systems. Online : NeurIPS
  11. Oh Yoehan & Hong Sungook. (2018). Does Artificial Intelligence Algorithm Discriminate Certain Groups of Humans?. Journal of Science & Technology Studies, 18(3), 153-215. DOI : 10.22989/jsts.2018.18.3.004
  12. L'Ecuyer, P. (1994). Uniform random number generation. Annals of Operations Research, 53, 77-120. DOI : 10.1007/BF02136827
  13. Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(11).
  14. Joong-jo Park, Tae-Woong Kim & Kyoung-min Kim. (2010). Handwritten Numeral Recognition using Composite Features and SVM classifier. Journal of the Korea Institute of Information and Communication Engineering, 14(12), 2761-2768. DOI : 10.6109/jkiice.2010.14.12.2761
  15. NIA. (2023). Guidelines for Data Quality Management for Artifical Intelligence Learning v3.0. Daegu : NIA.
  16. Kyunam Lee, Jongtae Lim, Kyoungsoo Bok, & Jaesoo Yoo (2019). Handling Method of Imbalance Data for Machine Learning : Focused on Sampling. JOURNAL OF THE KOREA CONTENTS ASSOCIATION, 19(11), 567-577. DOI : 10.5392/JKCA.2019.19.11.567