Hashing has been widely studied for big data retrieval due to its low storage cost and fast query speed. Zero-shot hashing (ZSH) aims to learn a hashing model that is trained using only samples from seen categories, but can generalize well to samples of unseen categories. ZSH generally uses category attributes to seek a semantic embedding space to transfer knowledge from seen categories to unseen ones. As a result, it may perform poorly when labeled data are insufficient. ZSH methods are mainly designed for single-modality data, which prevents their application to the widely spread multi-modal data. On the other hand, existing cross-modal hashing solutions assume that all the modalities share the same category labels, while in practice the labels of different data modalities may be different. To address these issues, we propose a general Cross-modal Zero-shot Hashing (CZHash) solution to effectively leverage unlabeled and labeled multi-modality data with different label spaces. CZHash first quantifies the composite similarity between instances using label and feature information. It then defines an objective function to achieve deep feature learning compatible with the composite similarity preserving, category attribute space learning, and hashing coding function learning. CZHash further introduces an alternative optimization procedure to jointly optimize these learning objectives. Experiments on benchmark multi-modal datasets show that CZHash significantly outperforms related representative hashing approaches both on effectiveness and adaptability.
Bibliographical noteKAUST Repository Item: Exported on 2020-10-01
Acknowledgements: We appreciate the authors who kindly share their codes with us for experiments. This research is supported by NSFC (61872300 and 61873214), Fundamental Research Funds for the Central Universities (XDJK2019B024 and XDJK2019D019), Natural Science Foundation of CQ CSTC (cstc2018jcyjAX0228) and by the King Abdullah University of Science and Technology (KAUST), Saudi Arabia.