Abstract
We introduce Affective Visual Dialog, an emotion explanation and reasoning task as a testbed for research on understanding constructed emotions in response to visually grounded conversations. The task involves three skills: (1) Dialog-based Question Answering (2) Dialog-based Emotion Prediction and (3) Affective explanation generation based on the dialog. Our key contribution is the collection of a large-scale dataset, dubbed AffectVisDial, consisting of 50K 10-turn visually grounded dialogs as well as concluding emotion attributions and dialog-informed textual emotion explanations, resulting in a total of 27,180 working hours. Notably, the dataset spans a broad range of visual stimuli, covering human heritage and contemporary life, with an average per-turn answer length of about 12 words—5 times that of the VisDial dataset—and explanations exceeding 28 words on average. We explain our determining design decisions in collecting the dataset, data inclusion and exclusion criteria starting from over 100K dialogs for quality control, and introduce the questioner and answerer tasks that are associated with the participants in the conversation. We propose and demonstrate solid Affective Visual Dialog baselines adapted from state-of-the-art multimodal models. Remarkably, the responses generated by our models show promising emotional reasoning abilities in response to visually grounded conversations. Our project page with the dataset is available through https://affective-visual-dialog.github.io.
Original language | English (US) |
---|---|
Title of host publication | Computer Vision – ECCV 2024 - 18th European Conference, Proceedings |
Editors | Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, Gül Varol |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 18-36 |
Number of pages | 19 |
ISBN (Print) | 9783031732256 |
DOIs | |
State | Published - 2025 |
Event | 18th European Conference on Computer Vision, ECCV 2024 - Milan, Italy Duration: Sep 29 2024 → Oct 4 2024 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 15133 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 18th European Conference on Computer Vision, ECCV 2024 |
---|---|
Country/Territory | Italy |
City | Milan |
Period | 09/29/24 → 10/4/24 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
Keywords
- affective computing
- affective reasoning
- vision and language
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science