This paper introduces ArtELingo, a new benchmark and dataset, designed to encourage work on diversity across languages and cultures. Following ArtEmis, a collection of 80k artworks from WikiArt with 0.45M emotion labels and English-only captions, ArtELingo adds another 0.79M annotations in Arabic and Chinese, plus 4.8K in Spanish to evaluate “cultural-transfer” performance. More than 51K artworks have 5 annotations or more in 3 languages. This diversity makes it possible to study similarities and differences across languages and cultures. Further, we investigate captioning tasks, and find diversity improves the performance of baseline models. ArtELingo is publicly available with standard splits and baseline models. We hope our work will help ease future research on multilinguality and culturally-aware AI.
|Original language||English (US)|
|Number of pages||16|
|State||Published - 2022|
|Event||2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates|
Duration: Dec 7 2022 → Dec 11 2022
|Conference||2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022|
|Country/Territory||United Arab Emirates|
|Period||12/7/22 → 12/11/22|
Bibliographical noteFunding Information:
This work was supported by King Abdullah University of Science and Technology (KAUST), under Award No. BAS/1/1685-01-01.
© 2022 Association for Computational Linguistics.
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems