DOI QR코드

DOI QR Code

Spectral clustering based on the local similarity measure of shared neighbors

  • Cao, Zongqi (Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University) ;
  • Chen, Hongjia (Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University) ;
  • Wang, Xiang (Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University)
  • Received : 2021.07.23
  • Accepted : 2022.02.05
  • Published : 2022.10.10

Abstract

Spectral clustering has become a typical and efficient clustering method used in a variety of applications. The critical step of spectral clustering is the similarity measurement, which largely determines the performance of the spectral clustering method. In this paper, we propose a novel spectral clustering algorithm based on the local similarity measure of shared neighbors. This similarity measurement exploits the local density information between data points based on the weight of the shared neighbors in a directed k-nearest neighbor graph with only one parameter k, that is, the number of nearest neighbors. Numerical experiments on synthetic and real-world datasets demonstrate that our proposed algorithm outperforms other existing spectral clustering algorithms in terms of the clustering performance measured via the normalized mutual information, clustering accuracy, and F-measure. As an example, the proposed method can provide an improvement of 15.82% in the clustering performance for the Soybean dataset.

Keywords

Acknowledgement

Thank the anonymous reviewers for very helpful comments and suggestions. The work is supported by the National Natural Science Foundation of China under Frant Nos. 11961048, 12001262, and 11801258. The work is supported by Jiangxi Provincial Natural Science Foundation under Grant No. 20181ACB20001.

References

  1. G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning, Vol. 112, Springer, 2013.
  2. T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: data mining, inference, and prediction, Springer Science & Business Media, 2009.
  3. J. Malik, S. Belongie, T. Leung, and J. Shi, Contour and texture analysis for image segmentation, Int. J. Comput. Vis. 43 (2001), no. 1, 7-27. https://doi.org/10.1023/A:1011174803800
  4. N. Nithya, K. Duraiswamy, and P. Gomathy, A survey on clustering techniques in medical diagnosis, Int. J. Comput. Sci. Trends Technol. 1 (2013), no. 2, 17-23.
  5. N. Jardine and C. J. van Rijsbergen, The use of hierarchic clustering in information retrieval, Inform. Storage Retr. 7 (1971), no. 5, 217-240. https://doi.org/10.1016/0020-0271(71)90051-9
  6. A. Cuzzocrea, Privacy-preserving big data stream mining: opportunities, challenges, directions, (IEEE International Conference on Data Mining Workshops, New Orleans, LA USA), 2017. https://doi.org/10.1109/ICDMW.2017.140
  7. A. K. Jain, M. N. Murty, and P. J. Flynn, Data clustering: a review, ACM Comput Surv 31 (1999), no. 3, 264-323. https://doi.org/10.1145/331499.331504
  8. Z. Huang, Extensions to the k-means algorithm for clustering large data sets with categorical values, Data Mining Knowl Discov 2 (1998), no. 3, 283-304. https://doi.org/10.1023/A:1009769707641
  9. J. MacQueen, Some methods for classification and analysis of multivariate observations, (Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA), 1967, pp. 281-297.
  10. J. C. Dunn, Well-separated clusters and optimal fuzzy partitions, J. Cybern. 4 (1974), no. 1, 95-104. https://doi.org/10.1080/01969727408546059
  11. A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Royal Statistical Soc.: Series B (Methodological) 39 (1977), 1-22. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
  12. Z. Huang, A fast clustering algorithm to cluster very large categorical data sets in data mining, DMKD 3 (1997), no. 8, 34-39.
  13. H. Jia, S. Ding, X. Xu, and R. Nie, The latest research progress on spectral clustering, Neural Comput. Applicat. 24 (2014), no. 7, 1477-1486. https://doi.org/10.1007/s00521-013-1439-2
  14. J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000), no. 8, 888-905. https://doi.org/10.1109/34.868688
  15. L. Wang and M. Dong, Multi-level low-rank approximationbased spectral clustering for image segmentation, Pattern Recogn. Lett. 33 (2012), no. 16, 2206-2215. https://doi.org/10.1016/j.patrec.2012.07.024
  16. H. Y. Chae, K. Lee, J. Jang, K. Park, and J. J. Kim, A wearable sEMG pattern-recognition integrated interface embedding analog pseudo-wavelet preprocessing, IEEE Access 7 (2019), 151320-151328. https://doi.org/10.1109/ACCESS.2019.2948090
  17. D. Havrilov, S. Baraban, A. Volovyk, O. Zviahin, A. Semenov, and A. Savytskyi, Real-time video processing system based on field programmable gate array, (14th International Conference on Computer Sciences and Information Technologies, Lviv, Ukraine), Sept. 2019. https://doi.org/10.1109/STC-CSIT.2019.8929758
  18. Z. Yu, L. Li, J. You, H.-S. Wong, and G. Han, Sc3 : triple spectral clustering-based consensus clustering framework for class discovery from cancer gene expression profiles, IEEE/ACM Trans Computat. Biology Bioinform. 9 (2012), no. 6, 1751-1765. https://doi.org/10.1109/TCBB.2012.108
  19. D. J. Higham, G. Kalna, and M. Kibble, Spectral clustering and its use in bioinformatics, J. Computat. Appl Math. 204 (2007), no. 1, 25-37. https://doi.org/10.1016/j.cam.2006.04.026
  20. U. Von Luxburg, A tutorial on spectral clustering, Statistics Comput. 17 (2007), no. 4, 395-416. https://doi.org/10.1007/s11222-007-9033-z
  21. A. Ng, M. Jordan, and Y. Weiss, On spectral clustering: analysis and an algorithm, Adv. Neural Inform. Process. Syst. 14 (2001), 849-856.
  22. L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, Adv. Neural Inform. Process. Syst. 17 (2005), 1601-1608.
  23. X. Zhang, J. Li, and H. Yu, Local density adaptive similarity measurement for spectral clustering, Pattern Recogn. Lett. 32 (2011), no. 2, 352-358. https://doi.org/10.1016/j.patrec.2010.09.014
  24. M. Lucinska and S. T. Wierzchon, Spectral clustering based on k-nearest neighbor graph, (11th International Conference on Computer Information Systems and Industrial Management, Venice, Italy), 2012, pp. 254-265.
  25. M. Tan, S. Zhang, and L. Wu, Mutual kNN based spectral clustering, Neural Comput. Applicat. 32 (2018), no. 11, 6435-6442.
  26. R. A. Jarvis and E. A. Patrick, Clustering using a similarity measure based on shared near neighbors, IEEE Trans. Comput. 100 (1973), no. 11, 1025-1034.
  27. X. Ye and T. Sakurai, Spectral clustering using robust similarity measure based on closeness of shared nearest neighbors, (International Joint Conference on Neural Networks, Killarney, Ireland), 2015. https://doi.org/10.1109/IJCNN.2015.7280495
  28. X. Ye and T. Sakurai, Robust similarity measure for spectral clustering based on shared neighbors, ETRI J. 38 (2016), no. 3, 540-550.
  29. Q. Zhu, J. Feng, and J. Huang, Natural neighbor: a selfadaptive neighborhood method without parameter K, Pattern Recogn. Lett. 80 (2016), 30-36. https://doi.org/10.1016/j.patrec.2016.05.007
  30. M. Yuan and Q. Zhu, Spectral clustering algorithm based on fast search of natural neighbors, IEEE Access 8 (2020), 67277-67288. https://doi.org/10.1109/ACCESS.2020.2985425
  31. M. Alshammari, J. Stavrakaris, and M. Takatsuka, Refining a k-nearest neighbor graph for a computationally efficient spectral clustering, Pattern Recogn. 114 (2021), 2021. https://doi.org/10.1016/j.patcog.2021.107869
  32. B. Mohar, Y. Alavi, G. Chartrand, and O. Oellermann, The Laplacian spectrum of graphs, Graph Theory, Combinatorics Applicat. 2 (1991), no. 12, 871-898.
  33. B. Mohar, Some applications of Laplace eigenvalues of graphs, Graph Symmetry Springer (1997), 225-275.
  34. F. R. Chung and F. C. Graham, Spectral graph theory, no. 92 American Mathematical Soc., 1997.
  35. Feature selection, 2018. http://featureselection.asu.edu/datasets.php
  36. A. Asuncion and D. Newman, UCI machine learning repository, 2007. https://archive.ics.uci.edu/ml/
  37. A. Strehl and J. Ghosh, Cluster ensembles-a knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res. 3 (2002), 583-617.
  38. M. Du, S. Ding, and H. Jia, Study on density peaks clustering based on k-nearest neighbors and principal component analysis, Knowledge-Based Syst. 99 (2016), 135-145. https://doi.org/10.1016/j.knosys.2016.02.001
  39. D. Cheng, Q. Zhu, J. Huang, Q. Wu, and L. Yang, Clustering with local density peaks-based minimum spanning tree, IEEE Trans. Knowl. Data Eng. 33 (2019), no. 2, 374-387.
  40. J. Xie, Z. Y. Xiong, Y. F. Zhang, Y. Feng, and J. Ma, Density core-based clustering algorithm with dynamic scanning radius, Knowledge-Based Syst. 142 (2018), 58-70. https://doi.org/10.1016/j.knosys.2017.11.025
  41. Y. Chen, S. Tang, L. Zhou, C. Wang, J. Du, T. Wang, and S. Pei, Decentralized clustering by finding loose and distributed density cores, Inform. Sci. 433 (2018), 510-526.