AI Performance Based On Learning-Data Labeling Accuracy

Ji-Hoon Lee;Jieun Shin;

doi:10.22678/JIC.2024.22.1.177

Journal of Industrial Convergence (산업융합연구)

Volume 22 Issue 1
/
Pages.177-183
/
2024
/
2635-8875(pISSN)
/
2672-0124(eISSN)

DAEHAN Society of Industrial Management (대한산업경영학회)

DOI QR Code

AI Performance Based On Learning-Data Labeling Accuracy

인공지능 학습데이터 라벨링 정확도에 따른 인공지능 성능

Ji-Hoon Lee (Dept. of Biomedical Informatics, College of Medicine, Konyang University) ;
Jieun Shin (Dept. of Biomedical Informatics, College of Medicine, Konyang University, The Head of HealthCareData Verification Center)

이지훈 (건양대학교 의과대학 정보의학교실) ;
신지은 (건양대학교 의과대학 정보의학교실, 건양대학교 병원 헬스케어데이터 검증센터)

Received : 2023.12.11
Accepted : 2024.01.20
Published : 2024.01.28

https://doi.org/10.22678/JIC.2024.22.1.177 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

The study investigates the impact of data quality on the performance of artificial intelligence (AI). To this end, the impact of labeling error levels on the performance of artificial intelligence was compared and analyzed through simulation, taking into account the similarity of data features and the imbalance of class composition. As a result, data with high similarity between characteristic variables were found to be more sensitive to labeling accuracy than data with low similarity between characteristic variables. It was observed that artificial intelligence accuracy tended to decrease rapidly as class imbalance increased. This will serve as the fundamental data for evaluating the quality criteria and conducting related research on artificial intelligence learning data.

본 연구는 데이터의 품질이 인공지능(AI) 성능에 미치는 영향을 검토한다. 이를 위해, 데이터 특성변수(Feature)의 유사도와 클래스(Class) 구성의 불균형을 고려한 모의실험(Simulation)을 통해 라벨링 오류 수준이 인공지능의 성능에 미치는 영향을 비교 분석하였다. 그 결과, 특성변수 간 유사성이 높은 데이터에서는 특성 변수 간 유사성이 낮은 데이터에 비해 라벨링 정확도에 더 민감하게 반응하였으며, 클래스 불균형이 증가함에 따라 인공지능 정확도가 급격히 감소되는 경향을 관찰하였다. 이는 인공지능 학습데이터의 품질평가 기준 및 관련 연구를 위한 기초자료가 될 것이다.

Keywords

References

Dong-ah Park (2017). A Study on Conversational Public Administration Service of the Chatbot Based on Artificial Intelligence. Journal of Korea Multimedia Society, 20(8), 1347-1356. DOI : 10.9717/kmms.2017.20.8.1347
Choi S.S. & Hong A.R (2021). Identifying Issue Changes of AI Chatbot 'Iruda' Case and Its Implications. Electronics and Telecommunications Trends, 36(2), 93-101. DOI : 10.22648/ETRI.2021.J.360210
Yun Sangoh. (2018). Issues of Public Service Using Artificial Intelligence -Focused on ChatBot Service-. Korean Public Management Review, 32(2), 83-104. DOI : 10.24210/kapm.2018.32.2.004
Jung Won-Sup. (2020). Discrimination and Bias of Artificial Intelligence. HUMAN BEINGS, ENVIRONMENT AND THEIR FUTURE, (25), 55-73. DOI: 10.34162/hefins.2020..25.003
Jang Jun Hee & Seok-Joo Koh. (2020, November). A Study on the Policy Direction for Building AI Learning Data, Proceedings of Symposium of the Korean Institute of communications and Information Sciences. (pp. 305-306). Online : KICS
Berrar, D. (2019). Cross-Validation. Encyclopedia of Bioinformatics and Computational Biology, 1, 542-545. DOI : 10.1016/B978-0-12-809633-8.20349-X
Jongwook Yoon (2019). A Pilot Study on the Standardization of Machine Learning Dataset Construction Process. The Journal of Internet Electronic Commerce Resarch, 19(5), 199-217. DOI : 10.37272/JIECR.2019.10.19.5.199
Kim, S. H. & Ryu, D. (2023, June). Method for improving video/image data quality for AI learning of unstructured data. Jouranl of Information and Security, 23(2), 55-66. DOI : 10.33778/kcsa.2023.23.2.055
Soon-Jae Kim, Woo-Hyeok Son, Ji-Hye Lee, Hoa-Hung Nguyen & Han-You Jeong. (2023). Sampling-based Analysis of Labeling Errors in AI-Hub Traffic-Light Datasets. Journal of the Institute of Electronics and Information Engineers, 60(3), 109-112. DOI : 10.5573/ieie.2023.60.3.109
Northcutt, C. G., Athalye, A., & Mueller, J. (2021, December). Pervasive label errors in test sets destabilize machine learning benchmarks. Conference on Neural Information Processing Systems. Online : NeurIPS
Oh Yoehan & Hong Sungook. (2018). Does Artificial Intelligence Algorithm Discriminate Certain Groups of Humans?. Journal of Science & Technology Studies, 18(3), 153-215. DOI : 10.22989/jsts.2018.18.3.004
L'Ecuyer, P. (1994). Uniform random number generation. Annals of Operations Research, 53, 77-120. DOI : 10.1007/BF02136827
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of machine learning research, 9(11).
Joong-jo Park, Tae-Woong Kim & Kyoung-min Kim. (2010). Handwritten Numeral Recognition using Composite Features and SVM classifier. Journal of the Korea Institute of Information and Communication Engineering, 14(12), 2761-2768. DOI : 10.6109/jkiice.2010.14.12.2761
NIA. (2023). Guidelines for Data Quality Management for Artifical Intelligence Learning v3.0. Daegu : NIA.
Kyunam Lee, Jongtae Lim, Kyoungsoo Bok, & Jaesoo Yoo (2019). Handling Method of Imbalance Data for Machine Learning : Focused on Sampling. JOURNAL OF THE KOREA CONTENTS ASSOCIATION, 19(11), 567-577. DOI : 10.5392/JKCA.2019.19.11.567

Journal of Industrial Convergence (산업융합연구)

AI Performance Based On Learning-Data Labeling Accuracy

인공지능 학습데이터 라벨링 정확도에 따른 인공지능 성능

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)