3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language

Ahmed Abdelreheem, Ujjwal Upadhyay, Ivan Skorokhodov, Rawan Al Yahya, Jun Chen, Mohamed Elhoseiny

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

In this paper, we study fine-grained 3D object identification in real-world scenes described by a textual query. The task aims to discriminatively understand an instance of a particular 3D object described by natural language utterances among other instances of 3D objects of the same class appearing in a visual scene. We introduce the 3DRefTransformer net, a transformer-based neural network that identifies 3D objects described by linguistic utterances in real-world scenes. The network's input is 3D object segmented point cloud images representing a real-world scene and a language utterance that refers to one of the scene objects. The goal is to identify the referred object. Compared to the state-of-the-art models that are mostly based on graph convolutions and LSTMs, our 3DRefTrans-former net offers two key advantages. First, it is an end-to-end transformer model that operates both on language and 3D visual objects. Second, it has a natural ability to ground textual terms in the utterance to the learning representation of 3D objects in the scene. We further incorporate object pairwise spatial relation loss and contrastive learning during model training. We show in our experiments that our model improves the performance upon the current SOTA significantly on Referit3D Nr3D and Sr3D datasets. Code and Models will be made publicly available at https://vision-cair.github.io/3dreftransformer/.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages607-616
Number of pages10
ISBN (Electronic)9781665409155
DOIs
StatePublished - 2022
Event22nd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022 - Waikoloa, United States
Duration: Jan 4 2022Jan 8 2022

Publication series

NameProceedings - 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022

Conference

Conference22nd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022
Country/TerritoryUnited States
CityWaikoloa
Period01/4/2201/8/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Keywords

  • 3D Computer Vision Large-scale Vision Applications
  • Analysis and Understanding
  • Object Detection/Recognition/Categorization
  • Scene Understanding
  • Vision and Languages
  • Visual Reasoning

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Fingerprint

Dive into the research topics of '3DRefTransformer: Fine-Grained Object Identification in Real-World Scenes Using Natural Language'. Together they form a unique fingerprint.

Cite this