Scene structure inference through scene map estimation

Moos Hueting, Viorica Pǎtrǎucean, Maks Ovsjanikov, Niloy J. Mitra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations


Understanding indoor scene structure from a single RGB image is useful for a wide variety of applications ranging from the editing of scenes to the mining of statistics about space utilization. Most efforts in scene understanding focus on extraction of either dense information such as pixel-level depth or semantic labels, or very sparse information such as bounding boxes obtained through object detection. In this paper we propose the concept of a scene map, a coarse scene representation, which describes the locations of the objects present in the scene from a top-down view (i.e., as they are positioned on the floor), as well as a pipeline to extract such a map from a single RGB image. To this end, we use a synthetic rendering pipeline, which supplies an adapted CNN with virtually unlimited training data. We quantitatively evaluate our results, showing that we clearly outperform a dense baseline approach, and argue that scene maps provide a useful representation for abstract indoor scene understanding.

Original languageEnglish (US)
Title of host publicationVMV 2016 - Vision, Modeling and Visualization
EditorsDieter Fellner
PublisherEurographics Association
Number of pages8
ISBN (Electronic)9783038680253
StatePublished - 2016
Event21st International Symposium on Vision, Modeling and Visualization, VMV 2016 - Bayreuth, Germany
Duration: Oct 10 2016Oct 12 2016

Publication series

NameVMV 2016 - Vision, Modeling and Visualization


Other21st International Symposium on Vision, Modeling and Visualization, VMV 2016

Bibliographical note

Publisher Copyright:
© 2016 The Author(s) Eurographics Proceedings © 2016 The Eurographics Association.

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Modeling and Simulation


Dive into the research topics of 'Scene structure inference through scene map estimation'. Together they form a unique fingerprint.

Cite this