Due to the shortcomings of the weakly-supervised and fully-supervised object detection (i.e. unsatisfactory performance and expensive annotations, respectively), leveraging partially labeled images in a cost-effective way to train an object detector has attracted much attention. In this paper, we formulate this challenging task as a missing bounding-boxes object detection problem. Specifically, we develop a pseudo ground truth mining (PGTM) procedure to automatically find the missing bounding-boxes for the unlabeled instances, called pseudo ground truths here, in the training data, and then combine the mined pseudo ground truths and the labeled annotations to train a fully-supervised object detector. Furthermore, we further propose an incremental learning (IL) framework to gradually incorporate the results of the trained fully-supervised detector to improve the performance of missing bounding-boxes object detection. More importantly, we find an effective way to label the massive images with limited labors and funds, which is crucial when building a large-scale weakly/webly labeled dataset for object detection. Extensive experiments on the PASCAL VOC and COCO benchmarks demonstrate that our proposed method can narrow the gap between fully-supervised and weakly-supervised object detectors, and we outperform the previous state-of-the-art weakly-supervised detectors by a large margin (more than 3% mAP absolutely) when the missing rate equals 0.9. Moreover, our proposed method with 30% missing bounding-box annotations can achieve comparable performance to some fully-supervised detectors.
|Number of pages
|IEEE Transactions on Circuits and Systems for Video Technology
|Published - 2019