Abstract
Depth estimation from a single image in the wild remains a challenging problem. One main obstacle is the lack of high-quality training data for images in the wild. In this paper we propose a method to automatically generate such data through Structure-from-Motion (SfM) on Internet videos. The core of this method is a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. Using this method, we collect single-view depth training data from a large number of YouTube videos and construct a new dataset called YouTube3D. Experiments show that YouTube3D is useful in training depth estimation networks and advances the state of the art of single-view depTH ESTIMAtion in the wild.
Original language | English (US) |
---|---|
Title of host publication | 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
Publisher | IEEE |
Pages | 5597-5606 |
Number of pages | 10 |
ISBN (Print) | 9781728132938 |
DOIs | |
State | Published - Jan 9 2020 |
Externally published | Yes |
Bibliographical note
KAUST Repository Item: Exported on 2022-06-30Acknowledged KAUST grant number(s): OSR-2015-CRG4-2639
Acknowledgements: This publication is based upon work partially supported by National Science Foundation under Grant No. 1617767, the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-2015-CRG4-2639 and a gift from Google.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.