LAM: Remote sensing image captioning with Label-Attention Mechanism

Zhengyuan Zhang, Wenhui Diao, Wenkai Zhang, Menglong Yan, Xin Gao, Xian Sun

Research output: Contribution to journalArticlepeer-review

30 Scopus citations


Significant progress has been made in remote sensing image captioning by encoder-decoder frameworks. The conventional attention mechanism is prevalent in this task but still has some drawbacks. The conventional attention mechanism only uses visual information about the remote sensing images without considering using the label information to guide the calculation of attention masks. To this end, a novel attention mechanism, namely Label-Attention Mechanism (LAM), is proposed in this paper. LAM additionally utilizes the label information of high-resolution remote sensing images to generate natural sentences to describe the given images. It is worth noting that, instead of high-level image features, the predicted categories' word embedding vectors are adopted to guide the calculation of attention masks. Representing the content of images in the form of word embedding vectors can filter out redundant image features. In addition, it can also preserve pure and useful information for generating complete sentences. The experimental results from UCM-Captions, Sydney-Captions and RSICD demonstrate that LAM can improve the model's performance for describing high-resolution remote sensing images and obtain better Sm scores compared with other methods. Sm score is a hybrid scoring method derived from the AI Challenge 2017 scoring method. In addition, the validity of LAM is verified by the experiment of using true labels.
Original languageEnglish (US)
JournalRemote Sensing
Issue number20
StatePublished - Oct 1 2019
Externally publishedYes

Bibliographical note

Generated from Scopus record by KAUST IRTS on 2023-09-21

ASJC Scopus subject areas

  • General Earth and Planetary Sciences


Dive into the research topics of 'LAM: Remote sensing image captioning with Label-Attention Mechanism'. Together they form a unique fingerprint.

Cite this