SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries

Hassan Mkhallati*, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

Soccer is more than just a game - it is a passion that transcends borders and unites people worldwide. From the roar of the crowds to the excitement of the commentators, every moment of a soccer match is a thrill. Yet, with so many games happening simultaneously, fans cannot watch them all live. Notifications for main actions can help, but lack the engagement of live commentary, leaving fans feeling disconnected. To fulfill this need, we propose in this paper a novel task of dense video captioning focusing on the generation of textual commentaries anchored with single times-tamps. To support this task, we additionally present a challenging dataset consisting of almost 37k timestamped commentaries across 715.9 hours of soccer broadcast videos. Additionally, we propose a first benchmark and baseline for this task, highlighting the difficulty of temporally anchoring commentaries yet showing the capacity to generate meaningful commentaries. By providing broadcasters with a tool to summarize the content of their video with the same level of engagement as a live game, our method could help satisfy the needs of the numerous fans who follow their team but cannot necessarily watch the live game. We believe our method has the potential to enhance the accessibility and understanding of soccer content for a wider audience, bringing the excitement of the game to more people.

Original languageEnglish (US)
Title of host publicationProceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023
PublisherIEEE Computer Society
Pages5074-5085
Number of pages12
ISBN (Electronic)9798350302493
DOIs
StatePublished - 2023
Event2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023 - Vancouver, Canada
Duration: Jun 18 2023Jun 22 2023

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume2023-June
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2023
Country/TerritoryCanada
CityVancouver
Period06/18/2306/22/23

Bibliographical note

Funding Information:
This paper proposes the novel task of single-anchored dense video captioning focusing on generating textual commentaries anchored with single timestamps. To support this task, we present SoccerNet-Caption, a challenging dataset consisting of 37k timestamped commentaries across 715.9 hours of soccer broadcast videos. We benchmarked a first baseline algorithm on this dataset, highlighting the difficulty of temporally anchoring commentaries yet showing the capacity to generate meaningful commentaries. Acknowledgement. This work was partly supported by KAUST OSR through the VCC funding and the SDAIA-KAUST Center of Excellence in Data Science and Artificial Intelligence. A. Cioppa is funded by the F.R.S.-FNRS.

Publisher Copyright:
© 2023 IEEE.

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'SoccerNet-Caption: Dense Video Captioning for Soccer Broadcasts Commentaries'. Together they form a unique fingerprint.

Cite this