Abstract
Understanding indoor scene structure from a single RGB image is useful for a wide variety of applications ranging from the editing of scenes to the mining of statistics about space utilization. Most efforts in scene understanding focus on extraction of either dense information such as pixel-level depth or semantic labels, or very sparse information such as bounding boxes obtained through object detection. In this paper we propose the concept of a scene map, a coarse scene representation, which describes the locations of the objects present in the scene from a top-down view (i.e., as they are positioned on the floor), as well as a pipeline to extract such a map from a single RGB image. To this end, we use a synthetic rendering pipeline, which supplies an adapted CNN with virtually unlimited training data. We quantitatively evaluate our results, showing that we clearly outperform a dense baseline approach, and argue that scene maps provide a useful representation for abstract indoor scene understanding.
Original language | English (US) |
---|---|
Title of host publication | VMV 2016 - Vision, Modeling and Visualization |
Editors | Dieter Fellner |
Publisher | Eurographics Association |
Pages | 45-52 |
Number of pages | 8 |
ISBN (Electronic) | 9783038680253 |
DOIs | |
State | Published - 2016 |
Event | 21st International Symposium on Vision, Modeling and Visualization, VMV 2016 - Bayreuth, Germany Duration: Oct 10 2016 → Oct 12 2016 |
Publication series
Name | VMV 2016 - Vision, Modeling and Visualization |
---|
Other
Other | 21st International Symposium on Vision, Modeling and Visualization, VMV 2016 |
---|---|
Country/Territory | Germany |
City | Bayreuth |
Period | 10/10/16 → 10/12/16 |
Bibliographical note
Publisher Copyright:© 2016 The Author(s) Eurographics Proceedings © 2016 The Eurographics Association.
ASJC Scopus subject areas
- Computer Vision and Pattern Recognition
- Modeling and Simulation