Abstract
We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth.
Original language | English (US) |
---|---|
Title of host publication | 8th International Conference on Learning Representations, ICLR 2020 |
Publisher | International Conference on Learning Representations, ICLR |
State | Published - Jan 1 2020 |
Externally published | Yes |
Bibliographical note
KAUST Repository Item: Exported on 2023-04-05Acknowledged KAUST grant number(s): OSR-2015-CRG4-2639
Acknowledgements: We would like to thank Zhaoheng Zheng for helping with baseline experiments. This work was partially funded by the Toyota Research Institute, the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-2015-CRG4-2639, and the National Science Foundation under Grant No. 1617767.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.