Spectral clustering by considering stationary distribution vector and transition matrix

Document Type : Research Paper

Authors

1 Department of Applied Mathematics, Graduate University of Advanced Technology, Kerman, Iran.

2 Faculty of Electrical and Computer Engineering, Graduate University of Advanced Technology, Kerman, Iran.

10.22072/wala.2023.1989619.1413

Abstract

 One of the popular methods of data clustering is spectral clustering. The main step of this method is constructing a graph representation of the data set and its similarity matrix. The similarity matrices which are constructed based on some important points not all data points, are among the main approaches. In this paper, the stationary distribution for a random walk on a weighted graph $G$ is considered to find anchor points of the data set. Then we build the similarity matrix based on the anchor nodes and the weighted random walk transition matrix. After that, spectral clustering is applied on the gained similarity matrix. We propose the theoretical discussions and then we evaluate our method on benchmarks.        

Keywords


[1] D. Cai and X. Chen, Large scale spectral clustering via landmark-based sparse representation, IEEET Cybernetics, 45(8) 
     (2015), 1669-1680.
[2] Y. Chen, Ch.-G. Li and Ch. You, Stochastic Sparse Subspace Clustering, 2020 IEEE/CVF Conference on Computer Vision 
     and Pattern Recognition (CVPR), Seattle, WA, USA}, (2020), 4154-4163.
[3] T. Du, G. Wen, Zh. Cai, W. Zheng, M. Tan and Y. Li, Spectral clustering algorithm combining local covariance matrix with
      normalization, Neural Computing and Applications, 32 (2018), 6611-6618.
[4] R. Hu, X. Zhu, D. Cheng, W. He, Y. Yan, J. Song and Sh. Zhang, Graph self-representation method for unsupervised
     feature selection, Neuro comput., 220 (2017), 130-137.
[5] P. Jain, Sh.M. Kakade, R. Kidambi, P. Netrapalli and A. Sidford, Parallelizing stochastic gradient descent for least 
     squares regression: mini-batching, averaging, and model misspecification, J. Mach. Learn Res., 18(223) (2018), 1-42.
[6] Zh. Kang, X. Zhao, Ch. Peng, H. Zhu, J.T. Zhou, X. Peng, W. Chen and Z. Xu, Partition level multiview subspace
     clustering, Neural Networks, 122 (2020), 279-288.
[7] W. Liu, J.Sh.-Fu Chang, Large graph construction for scalable semi-supervised learning, Proceedings of the 27th
      International Conference on Machine Learning, (2010), 679-686.
[8] H. Motallebi and M. Jamshidi, A distance scaling method to improve spectral clustering of data with different
     densities, International Journal of Data Science, 6(4) (2022), 328-347.
[9] H. Motallebi, R. Nasihatkon and M. Jamshidi, A local mean-based distance measure for spectral clustering, Pattern 
      Analysis and Applications, 25 (2022), 351-359.
[10] A.Y. Ng, M.I. Jordan and Y. Weiss, On spectral clustering: Analysis and an algorithm, Advances in Neural Information
       Processing Systems, 14 (2001), 849-856.
[11] X. Peng, J. Feng, J.T. Zhou, Y. Lei and Sh. Yan, Deep sparse subspace clustering, Computer vision and pattern
       recognition, (2017), arXiv:1709.08374 [cs.CV].
[12] A.P. Riascos and J.L. Mateos, Random walks on weighted networks: Exploring local and non-local navigation
        strategies, (2019), arXiv: Statistical Mechanics, 1-21.
[13] F. Saberi-Movahed, M. Mohammadifard, A. Mehrpooya, M. Rezaei-Ravari, K. Berahmand, M. Rostami, S. Karami, M. 
       Najafzadeh, D. Hajinezhad, M. Jamshidi, F. Abedi, M. Mohammadifard, E. Farbod, F. Safavi, M. Dorvash, N. Mottaghi-
       Dastjerdi, Sh. Vahedi, M. Eftekhari, F. Saberi-Movahed, H. Alinejad-Rokny, Sh. S. Band and Iman Tavassoly, Decoding 
        clinical biomarker space of covid-19: exploring matrix factorization-based feature selection methods, Computers in
        Biology and Medicine, 146 (2022), 105426.
[14] F. Saberi-Movahed, M. Rostami, K. Berahmand, S. Karami, P. Tiwari, M. Oussalah and Sh.S. Band, Dual Regularized 
       Unsupervised Feature Selection Based on Matrix Factorization and Minimum Redundancy with application in gene
       selection, Knowledge-Based Systems, 256(28) (2022), 109884.
[15] J.M. Santos and M. Embrechts, On the use of the adjusted Rand index as a metric for evaluating supervised
       classification, Artificial Neural Networks-ICANN, (2009), 175--184.
[16] P. Satorras, C. Castellano, P.V. Mieghem and A. Vespignani, Epidemic processes in complex networks, Reviews of 
        Modern Physics, 87 (2015), 925-979.
[17] Sh. Wang, X. Yuan, T. Yao, Sh. Yan and J. Shen, Efficient subspace segmentation via quadratic programming, AAAI 
       conference on artificial intelligence, (2011), 519-524.
[18] W. Zhu, F. Nie and X. Li, Fast spectral clustering with efficient large graph construction, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2017), 2492-2496.
[19] X. Zhu, Sh. Zhang, R. Hu, Y. Zhu and J. Song, Local and global structure preservation for robust unsupervised spectral
       feature selection, IEEE Trans Knowl Data Eng., 30(3) (2018), 517-529.
[20] X. Zhu, Sh. Zhang, Y. Li, J. Zhang and L. Yang, Low-rank sparse subspace for spectral clustering, IEEE Transactions on
       Knowledge and Data Engineering, 31(8) (2019), 1532-1543.
[21] J. Zhu and H. Wang, An improved K-means clustering algorithm, IEEE International Conference on Information 
       Management and Engineering, 9(1) (2010), 44-46.