End-to-end, single-stream temporal action detection in untrimmed videos

Shyamal Buch, Victor Escorcia, Bernard Ghanem, Li Fei-Fei, Juan Carlos Niebles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

170 Scopus citations


In this work, we present a new intuitive, end-to-end approach for temporal action detection in untrimmed videos. We introduce our new architecture for Single-Stream Temporal Action Detection (SS-TAD), which effectively integrates joint action detection with its semantic sub-tasks in a single unifying end-to-end framework. We develop a method for training our deep recurrent architecture based on enforcing semantic constraints on intermediate modules that are gradually relaxed as learning progresses. We find that such a dynamic learning scheme enables SS-TAD to achieve higher overall detection performance, with fewer training epochs. By design, our single-pass network is very efficient and can operate at 701 frames per second, while simultaneously outperforming the state-of-the-art methods for temporal action detection on THUMOS’14.
Original languageEnglish (US)
Title of host publicationProcedings of the British Machine Vision Conference 2017
PublisherBritish Machine Vision Association
ISBN (Print)190172560X
StatePublished - May 1 2019

Bibliographical note

KAUST Repository Item: Exported on 2020-10-01


Dive into the research topics of 'End-to-end, single-stream temporal action detection in untrimmed videos'. Together they form a unique fingerprint.

Cite this