A Method for Prediction of Quality Defects in Manufacturing Using Natural Language Processing and Machine Learning

자연어 처리 및 기계학습을 활용한 제조업 현장의 품질 불량 예측 방법론

  • 노정민 (고려사이버대학교 융합정보대학원) ;
  • 김용성 (고려사이버대학교 창의공학부 소프트웨어공학과)
  • Received : 2021.09.02
  • Accepted : 2021.09.21
  • Published : 2021.09.30

Abstract

Quality control is critical at manufacturing sites and is key to predicting the risk of quality defect before manufacturing. However, the reliability of manual quality control methods is affected by human and physical limitations because manufacturing processes vary across industries. These limitations become particularly obvious in domain areas with numerous manufacturing processes, such as the manufacture of major nuclear equipment. This study proposed a novel method for predicting the risk of quality defects by using natural language processing and machine learning. In this study, production data collected over 6 years at a factory that manufactures main equipment that is installed in nuclear power plants were used. In the preprocessing stage of text data, a mapping method was applied to the word dictionary so that domain knowledge could be appropriately reflected, and a hybrid algorithm, which combined n-gram, Term Frequency-Inverse Document Frequency, and Singular Value Decomposition, was constructed for sentence vectorization. Next, in the experiment to classify the risky processes resulting in poor quality, k-fold cross-validation was applied to categorize cases from Unigram to cumulative Trigram. Furthermore, for achieving objective experimental results, Naive Bayes and Support Vector Machine were used as classification algorithms and the maximum accuracy and F1-score of 0.7685 and 0.8641, respectively, were achieved. Thus, the proposed method is effective. The performance of the proposed method were compared and with votes of field engineers, and the results revealed that the proposed method outperformed field engineers. Thus, the method can be implemented for quality control at manufacturing sites.

제조업 현장에서 제작 공정 수행 전 품질 불량 위험 공정을 예측하여 사전품질관리를 수행하는 것은 매우 중요한 일이다. 하지만 기존 엔지니어의 역량에 의존하는 방법은 그 제작공정의 종류와 수가 다양할수록 인적, 물리적 한계에 부딪힌다. 특히 원자력 주요기기 제작과 같이 제작공정이 매우 광범위한 도메인 영역에서는 그 한계가 더욱 명확하다. 본 논문은 제조업 현장에서 자연어 처리 및 기계학습을 활용하여 품질 불량 위험 공정을 예측하는 방법을 제시하였다. 이를 위해 실제 원자력발전소에 설치되는 주기기를 제작하는 공장에서 6년 동안 수집된 제작 기록의 텍스트 데이터를 활용하였다. 텍스트 데이터의 전처리 단계에서는 도메인 지식이 잘 반영될 수 있도록 단어사전에 Mapping 하는 방식을 적용하였고, 문장 벡터화 과정에서는 N-gram, TF-IDF, SVD를 결합한 하이브리드 알고리즘을 구성하였다. 다음으로 품질 불량 위험 공정을 분류해내는 실험에서는 k-fold 교차 검증을 적용하고 Unigram에서 누적 Trigram까지 여러 케이스로 나누어 데이터셋에 대한 객관성을 확보하였다. 또한, 분류 알고리즘으로 나이브 베이즈(NB)와 서포트 벡터 머신(SVM)을 사용하여 유의미한 결과를 확보하였다. 실험결과 최대 accuracy와 F1-score가 각각 0.7685와 0.8641로서 상당히 유효한 수준으로 나타났다. 또한, 수행해본 적이 없는 새로운 공정을 예측하여 현장 엔지니어들의 투표와의 비교를 통해서 실제 현장에 자연스럽게 적용할 수 있음을 보여주었다.

Keywords

Acknowledgement

이 성과는 정부(과학기술정보통신부)의 재원으로 한국연구재단의 지원을 받아 수행된 연구임(No. 2020R1G1A1099559).

References

  1. B. Park and W. Lee. "Use Cases of Machine Learning Techniques for Manufacturing Process Data: Comparative Analysis of CART, Random Forest, and TreeNet," Quality Management of Korea, 2021(0), pp. 40-40, 2021
  2. J. Yoon, H. An and Y. Choi. "A Machine Learning Based Facility Error Pattern Extraction Framework for Smart Manufacturing," Society for E-Business Studies 23(2), pp. 97-110, 2018
  3. B. Kang, S. Park. "Integrated machine learning approaches for complementing statistical process control procedures." Decision Support Systems, 29(1), pp. 59-72, 2000 https://doi.org/10.1016/S0167-9236(00)00063-4
  4. G. Li, Y. Cao, S. Zhao, Y. Bao and C. Yu. "Research on Text Data Pool of Intelligent Manufacturing for Plate Parts." Journal of Physics: Conference Series, 1575(012199), 2020 https://doi.org/10.1088/1742-6596/1575/1/012199
  5. R. Fazai, K. Abodayeh, M. Mansouri, M. Trabelsi, H. Nounou, M. Nounou and G.E. Georghiou. "Machine learning-based statistical testing hypothesis for fault detection in photovoltaic systems." Solar Energy, 190, pp. 405-413, 2019 https://doi.org/10.1016/j.solener.2019.08.032
  6. K. Arif-Uz-Zaman, M. E. Cholette, L. Ma and A. Karim. "Extracting failure time data from industrial maintenance records using text mining." Advanced Engineering Informatics, 33, pp. 388-396, 2017 https://doi.org/10.1016/j.aei.2016.11.004
  7. T. Wuest, D. Weimer, C. Irgens and K. D. Thoben. "Machine learning in manufacturing: advantages, challenges, and applications." Production & Manufacturing Research, 4(1), pp. 23-45, 2016 https://doi.org/10.1080/21693277.2016.1192517
  8. S. An and S. Jo. "Stock Prediction Using News Text Mining and Time Series Analysis," Korea Computer Congress 2010, 37(1), pp. 364-369, 2010
  9. Y. Kim and S. Lee. "Combinations of Text Preprocessing and Word Embedding Suitable for Neural Network Models for Document Classification," Journal of KIISE, 45(7), pp. 690-700, 2018 https://doi.org/10.5626/jok.2018.45.7.690
  10. M. Lim and S. Kang. "Comparison Between Optimal Features of Korean and Chinese for Text Classification," International Journal of Fuzzy Logic and Intelligent Systems, 25(4), pp. 386-391, 2015
  11. A. I. Kadhim, Y. Cheah and N. H. Ahamed. "Text Document preprocessing and Dimension Reduction Techniques for Text Document Clustering." 2014 4th International Conference on Artificial Intelligence with Applications in Engineering and Technology, pp. 69-73, 2014
  12. C. S. Pavan Kumar and L. D. Dhinesh Babu. "Novel Text Preprocessing Framework for Sentiment Analysis. Smart Intelligent Computing and Applications. Smart Innovation," Systems and Technologies, pp. 105, 2019
  13. J. Camacho-Collados and M. T. Pilehvar. Cornell University (Ver. 3) https://arxiv.org/abs/1707.01780v3 (downloaded:2020. 9. 20)
  14. J. Perkins. Python 3 Text Processing with NLTK 3 Cookbook, 2nd Ed., pp. 36. (Packt Publishing Ltd., Birmingham)
  15. Jing Gao and Jun Zhang. "Clustered SVD strategies in latent semantic indexing," Information Processing & Management 41(5), pp. 1051-1063, 2005 https://doi.org/10.1016/j.ipm.2004.10.005