TY - JOUR
T1 - Multiple clusterings of heterogeneous information networks
AU - Wei, Shaowei
AU - Yu, Guoxian
AU - Wang, Jun
AU - Domeniconi, Carlotta
AU - Zhang, Xiangliang
N1 - KAUST Repository Item: Exported on 2021-06-15
Acknowledgements: This work is partially supported by NSFC (Nos. 62031003, 62072380 and 61872300).
PY - 2021/6/2
Y1 - 2021/6/2
N2 - Traditional clustering algorithms focus on a single clustering result; as such, they cannot explore potential diverse patterns of complex real world data. To deal with this problem, approaches that exploit meaningful alternative clusterings in data have been developed in recent years. Existing algorithms, including single view/multi-view multiple clustering methods, are designed for applications with i.i.d. data samples, and cannot handle the data samples with dependency presented in networks, especially in heterogeneous information networks (HIN). In this paper, we propose a framework (NetMCs) that can explore multiple clusterings in HIN. Specifically, NetMCs adopts a set of meta-path schemes with different semantics on HIN, and considers each meta-path scheme as a base clustering aspect. Guided by the meta-path schemes, NetMCs then introduces a variation of the skip-gram framework that can jointly optimize multiple clustering aspects, and simultaneously obtain the respective embedding representations and individual clusterings therein. To reduce redundancy between alternative clusterings, NetMCs utilizes an explicit regularization term to control the embedding diversity of the same nodes among different clustering aspects. Experiments on benchmark HIN datasets confirm the performance of NetMCs in generating multiple clusterings with high quality and diversity.
AB - Traditional clustering algorithms focus on a single clustering result; as such, they cannot explore potential diverse patterns of complex real world data. To deal with this problem, approaches that exploit meaningful alternative clusterings in data have been developed in recent years. Existing algorithms, including single view/multi-view multiple clustering methods, are designed for applications with i.i.d. data samples, and cannot handle the data samples with dependency presented in networks, especially in heterogeneous information networks (HIN). In this paper, we propose a framework (NetMCs) that can explore multiple clusterings in HIN. Specifically, NetMCs adopts a set of meta-path schemes with different semantics on HIN, and considers each meta-path scheme as a base clustering aspect. Guided by the meta-path schemes, NetMCs then introduces a variation of the skip-gram framework that can jointly optimize multiple clustering aspects, and simultaneously obtain the respective embedding representations and individual clusterings therein. To reduce redundancy between alternative clusterings, NetMCs utilizes an explicit regularization term to control the embedding diversity of the same nodes among different clustering aspects. Experiments on benchmark HIN datasets confirm the performance of NetMCs in generating multiple clusterings with high quality and diversity.
UR - http://hdl.handle.net/10754/669566
UR - https://link.springer.com/10.1007/s10994-021-06000-y
UR - http://www.scopus.com/inward/record.url?scp=85107413786&partnerID=8YFLogxK
U2 - 10.1007/s10994-021-06000-y
DO - 10.1007/s10994-021-06000-y
M3 - Article
SN - 1573-0565
JO - Machine Learning
JF - Machine Learning
ER -