Motion Dynamics Improve Speaker-Independent Lipreading

Matteo Riva, Michael Wand, Jurgen Schmidhuber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations


We present a novel lipreading system that improves on the task of speaker-independent word recognition by decoupling motion and content dynamics. We achieve this by implementing a deep learning architecture that uses two distinct pipelines to process motion and content and subsequently merges them, implementing an end-to-end trainable system that performs fusion of independently learned representations. We obtain a average relative word accuracy improvement of ≈6.8% on unseen speakers and of ≈3.3% on known speakers, with respect to a baseline which uses a standard architecture.
Original languageEnglish (US)
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages5
ISBN (Print)9781509066315
StatePublished - May 1 2020
Externally publishedYes

Bibliographical note

Generated from Scopus record by KAUST IRTS on 2022-09-14


Dive into the research topics of 'Motion Dynamics Improve Speaker-Independent Lipreading'. Together they form a unique fingerprint.

Cite this