Learning single-image depth from videos using quality assessment networks

Weifeng Chen, Shengyi Qian, Jia Deng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

50 Scopus citations


Depth estimation from a single image in the wild remains a challenging problem. One main obstacle is the lack of high-quality training data for images in the wild. In this paper we propose a method to automatically generate such data through Structure-from-Motion (SfM) on Internet videos. The core of this method is a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. Using this method, we collect single-view depth training data from a large number of YouTube videos and construct a new dataset called YouTube3D. Experiments show that YouTube3D is useful in training depth estimation networks and advances the state of the art of single-view depTH ESTIMAtion in the wild.
Original languageEnglish (US)
Title of host publication2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Number of pages10
ISBN (Print)9781728132938
StatePublished - Jan 9 2020
Externally publishedYes

Bibliographical note

KAUST Repository Item: Exported on 2022-06-30
Acknowledged KAUST grant number(s): OSR-2015-CRG4-2639
Acknowledgements: This publication is based upon work partially supported by National Science Foundation under Grant No. 1617767, the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-2015-CRG4-2639 and a gift from Google.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.


Dive into the research topics of 'Learning single-image depth from videos using quality assessment networks'. Together they form a unique fingerprint.

Cite this