Abstract
For face naming in TV series or movies, a typical way is using subtitles/script alignment to get the time stamps of the names, and tagging them to the faces. We study the problem of face naming in videos when subtitles are not available. To this end, we divide the problem into two tasks: face clustering which groups the faces depicting a certain person into a cluster, and name assignment which associates a name to each face. Each task is formulated as a structured prediction problem and modeled by a hidden conditional random field (HCRF) model. We argue that the two tasks are correlated problems whose outputs can provide prior knowledge of the target prediction for each other. The two HCRFs are coupled in a unified graphical model called coupled HCRF where the joint dependence of the cluster labels and face name association is naturally embedded in the correlation between the two HCRFs. We provide an effective algorithm to optimize the two HCRFs iteratively and the performance of the two tasks on real-world data set can be both improved.
Original language | English (US) |
---|---|
Pages (from-to) | 5780-5792 |
Number of pages | 13 |
Journal | IEEE Transactions on Image Processing |
Volume | 25 |
Issue number | 12 |
DOIs | |
State | Published - Aug 18 2016 |
Externally published | Yes |
Bibliographical note
KAUST Repository Item: Exported on 2020-10-01Acknowledgements: This work was supported in part by the 863 Program under Grant 2014AA015100, in part by the National Natural Science Foundation of China under Grant 61332016, Grant 61572500, Grant 61379100, and in part by the DARPA PerSEAS Program under Grant HR0011-10-C-0112. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Nikolaos V. Boulgouris.