X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Models

Jan Held*, Hani Itani, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

The rapid advancement of artificial intelligence has led to significant improvements in automated decision-making. However, the increased performance of models often comes at the cost of explainability and transparency of their decision-making processes. In this paper, we investigate the capabilities of large language models to explain decisions, using football refereeing as a testing ground, given its decision complexity and subjectivity. We introduce the EXplainable Video Assistant Referee System, X-VARS, a multi-modal large language model designed for understanding football videos from the point of view of a referee. X-VARS can perform a multitude of tasks, including video description, question answering, action recognition, and conducting meaningful conversations based on video content and in accordance with the Laws of the Game for football referees. We validate X-VARS on our novel dataset, SoccerNet-XFoul, which consists of more than 22k video-question-answer triplets annotated by over 70 experienced football referees. Our experiments and human study illustrate the impressive capabilities of X-VARS in interpreting complex football clips. Furthermore, we highlight the potential of X-VARS to reach human performance and support football referees in the future. We will provide code, model, dataset, and demo upon publication.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
PublisherIEEE Computer Society
Pages3267-3279
Number of pages13
ISBN (Electronic)9798350365474
DOIs
StatePublished - 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024 - Seattle, United States
Duration: Jun 16 2024Jun 22 2024

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2024
Country/TerritoryUnited States
CitySeattle
Period06/16/2406/22/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Keywords

  • Explainability
  • Football
  • Multi-modal language model
  • Referee

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Models'. Together they form a unique fingerprint.

Cite this