DOI QR코드

DOI QR Code

MARGIN-BASED GENERALIZATION FOR CLASSIFICATIONS WITH INPUT NOISE

  • Choe, Hi Jun (Department of Mathematics Yonsei University) ;
  • Koh, Hayeong (Software Testing & Certification Laboratory Telecommunication Technology Association) ;
  • Lee, Jimin (Center for Mathematical Analysis & Computation Yonsei University)
  • Received : 2020.07.14
  • Accepted : 2021.11.02
  • Published : 2022.03.01

Abstract

Although machine learning shows state-of-the-art performance in a variety of fields, it is short a theoretical understanding of how machine learning works. Recently, theoretical approaches are actively being studied, and there are results for one of them, margin and its distribution. In this paper, especially we focused on the role of margin in the perturbations of inputs and parameters. We show a generalization bound for two cases, a linear model for binary classification and neural networks for multi-classification, when the inputs have normal distributed random noises. The additional generalization term caused by random noises is related to margin and exponentially inversely proportional to the noise level for binary classification. And in neural networks, the additional generalization term depends on (input dimension) × (norms of input and weights). For these results, we used the PAC-Bayesian framework. This paper is considering random noises and margin together, and it will be helpful to a better understanding of model sensitivity and the construction of robust generalization.

Keywords

Acknowledgement

Hi Jun Choe and Hayeong Koh were supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Korea government (No. 2015R1A5A1009350, No. 20181A2A3074566). Jimin Lee was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Korea government (No. 2020R1I1A1A01071731).

References

  1. S. Arora, R. Ge, B. Neyshabur, and Y. Zhang, Stronger generalization bounds for deep nets via a compression approach, arXiv preprint arXiv:1802.05296, 2018.
  2. P. L. Bartlett, D. J. Foster, and M. J. Telgarsky, Spectrally-normalized margin bounds for neural networks, In Advances in Neural Information Processing Systems 30, pages 6240-6249, 2017.
  3. G. Elsayed, D. Krishnan, H. Mobahi, K. Regan, and S. Bengio, Large margin deep networks for classification, In Advances in neural information processing systems, pages 842-852, 2018.
  4. D. Haussler, Probably approximately correct learning, University of California, Santa Cruz, Computer Research Laboratory, 1990.
  5. Y. Jiang, D. Krishnan, H. Mobahi, and S. Bengio, Predicting the generalization gap in deep networks with margin distributions, arXiv preprint arXiv:1810.00113, 2018.
  6. J. Langford and J. Shawe-Taylor, Pac-bayes & margins, In Advances in neural information processing systems, pages 439-446, 2003.
  7. D. A. McAllester, PAC-Bayesian model averaging, in Proceedings of the Twelfth Annual Conference on Computational Learning Theory (Santa Cruz, CA, 1999), 164-170, ACM, New York, 1999. https://doi.org/10.1145/307400.307435
  8. D. A. McAllester, Pac-bayesian stochastic model selection, Machine Learning 51 (2003), no. 1, 5-21. https://doi.org/10.1023/A:1021840411064
  9. D. McAllester, Simplified pac-bayesian margin bounds, In Learning theory and Kernel machines, pages 203-215. Springer, 2003.
  10. B. Neyshabur, S. Bhojanapalli, and N. Srebro, A pac-bayesian approach to spectrally-normalized margin bounds for neural networks, International Conference on Learning Representations, 2018.
  11. L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. Madry, Adversarially robust generalization requires more data, In Advances in Neural Information Processing Systems, pages 5014-5026, 2018.
  12. L. Shen-Huan, W. Lu, and Z. Zhi-Hua, Optimal margin distribution network, CoRR, abs/1812.10761, 2018.
  13. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, Intriguing properties of neural networks, arXiv preprint arXiv:1312.6199, 2013.
  14. J. A. Tropp, User-friendly tail bounds for sums of random matrices, Found. Comput. Math. 12 (2012), no. 4, 389-434. https://doi.org/10.1007/s10208-011-9099-z
  15. D. Yin, K. Ramchandran, and P. Bartlett, Rademacher complexity for adversarially robust generalization, International Conference on Machine Learning, 2019.
  16. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, Understanding deep learning requires rethinking generalization, arXiv preprint arXiv:1611.03530, 2016.