DOI QR코드

DOI QR Code

Finding Unexpected Test Accuracy by Cross Validation in Machine Learning

  • Received : 2021.12.05
  • Published : 2021.12.30

Abstract

Machine Learning(ML) splits data into 3 parts, which are usually 60% for training, 20% for validation, and 20% for testing. It just splits quantitatively instead of selecting each set of data by a criterion, which is very important concept for the adequacy of test data. ML measures a model's accuracy by applying a set of validation data, and revises the model until the validation accuracy reaches on a certain level. After the validation process, the complete model is tested with the set of test data, which are not seen by the model yet. If the set of test data covers the model's attributes well, the test accuracy will be close to the validation accuracy of the model. To make sure that ML's set of test data works adequately, we design an experiment and see if the test accuracy of model is always close to its validation adequacy as expected. The experiment builds 100 different SVM models for each of six data sets published in UCI ML repository. From the test accuracy and its validation accuracy of 600 cases, we find some unexpected cases, where the test accuracy is very different from its validation accuracy. Consequently, it is not always true that ML's set of test data is adequate to assure a model's quality.

Keywords

Acknowledgement

This work was supported by the Hyupsung University Research Grant of 2020.

References

  1. Yun Xu and Royston Goodacre. On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning. Journal of Analysis and Testing. 2:249-262. (2018) https://doi.org/10.1007/s41664-018-0068-2
  2. Andrew Ng. Model Selection and Train/Validation/Test sets. Machine Learning @ Coursera.
  3. Shin Nakajima and Kai Ngoc BUI. Dataset Coverage for Testing Machine Learning Computer Programs. Proceedings of 23rd Asia- Pacific Software Engineering Conference; 2016 Dec 6-9; New Zealand: IEEE, (2016)
  4. Arnab Sharma and Keike Wehrheim. Testing Machine Learning Algorithms for Balanced Data Usage. Proceeding of 12th International Conference of Software Testing, Verification and Validation; 2019 April 22-27; China: IEEE; (2019)
  5. Senthil Mani and Anush Sankaran. Coverage Testing of Deep Learning Models using Dataset Characterization. ArXiv. 2019; arXiv:1911.07309.(2019)
  6. Du Zhang and Jeffrey Tsai. Machine Learning Applications in Software Engineering. World Scientific; (2005)
  7. F. Pedregosa et al. Scikit-learn: Machine Learning Systems with Python. Journal of Machine Learning Research. 12(85):2825-2830. (2011)
  8. D. Albanese, G. Merler, S.and Jurman, and R. Visintainer. MLPy: high-performance python package for predictive modelling. Proceeding on Workshop on Machine Learning Open Source Software. 2008 December 12; Canada: PASCAL. (2008)
  9. T. Schaul, J. Bayer, D. Wierstra, Y. Sun, M. Felder, F. Sehnke, T. Ruckstiess, and J. Schmidhuber. PyBrain. The Journal of Machine Learning Research. 11:743-746. (2010)
  10. M. Hanke, Y.O. Halchenko, P.B. Sederberg, S.J. Hanson, J.V. Haxby, and S. Pollmann. PyMVPA: A Python toolbox for multivariate pattern analysis of fMRI data. Neuro informatics. 7(1):37-53. (2009)
  11. T. Zito, N.Wilbert, L.Wiskott, and P. Berkes. Modular toolkit for data processing (MDP): A Python data processing framework. Frontiers in Neuro informatics. January (2008)
  12. S. Sonnenburg, G. Ratsch, S. Henschel, C.Widmer, J. Behr, A. Zien, F. de Bona, A. Binder, C. Gehl, and V. Franc. The SHOGUN machine learning toolbox. Journal of Machine Learning Research. 11:1799-1802 (2010)
  13. I Guyon, S. R. Gunn, A. Ben-Hur, and G. Dror. Result analysis of the NIPS 2003 feature selection challenge. Proceedings of the 17th International Conference on Neural Information Processing Systems. 2004 December; Vancouver, Canada: MIT press. (2004)
  14. Dua D, Graff C. UCI Machine Learning Repository. Available from: http://archive.ics.uci.edu/ml (2017)
  15. R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Human Genetics. 7(2):179-188 (1936)
  16. R. O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. John Wiley Sons: New York (1973)
  17. B. V. Dasarathy. Nosing around the neighbourhood: A new system structure and classification rule for recognition in partially exposed environments. IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-2(1):67-71 (1980) https://doi.org/10.1109/TPAMI.1980.4766972
  18. G.W. Gates. The reduced nearest neighbor rule. IEEE Transactions on Information Theory. 18(3): 431-433 (1972) https://doi.org/10.1109/TIT.1972.1054809
  19. Tao, D.C., Tang, X.O., Li, X.L., Wu, X.D. Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence. 28(7), 1088-1099 (2006) https://doi.org/10.1109/TPAMI.2006.134
  20. Giorgio, V., Marco, M., Francesca, R. Cancer recognition with bagged ensembles of support vector machines. Neuro computing. 2004; 56: 461-466 (2004)
  21. Hyunsoo, K., Peg, H., Haesun, P. Dimension Reduction in Text Classification with Support Vector Machines. Journal of Machine Learning Research. 6:37-53 (2005)
  22. Bellotti, T., Crook, J. Support vector machines for credit scoring and discovery of significant features. Expert Systems with Applications. 36:3302-3308 (2009) https://doi.org/10.1016/j.eswa.2008.01.005