Improving speaker-independent lipreading with domain-adversarial training

Michael Wand, Jürgen Schmidhuber

Research output: Chapter in Book/Report/Conference proceedingConference contribution

28 Scopus citations

Abstract

We present a Lipreading system, i.e. a speech recognition system using only visual features, which uses domain-adversarial training for speaker independence. Domain-adversarial training is integrated into the optimization of a lipreader based on a stack of feedforward and LSTM (Long Short-Term Memory) recurrent neural networks, yielding an end-to-end trainable system which only requires a very small number of frames of un-transcribed target data to substantially improve the recognition accuracy on the target speaker. On pairs of different source and target speakers, we achieve a relative accuracy improvement of around 40% with only 15 to 20 seconds of untranscribed target speech data. On multi-speaker training setups, the accuracy improvements are smaller but still substantial.
Original languageEnglish (US)
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInternational Speech Communication Association4 Rue des Fauvettes - Lous TourilsBaixas66390
Pages3662-3666
Number of pages5
DOIs
StatePublished - Jan 1 2017
Externally publishedYes

Bibliographical note

Generated from Scopus record by KAUST IRTS on 2022-09-14

Fingerprint

Dive into the research topics of 'Improving speaker-independent lipreading with domain-adversarial training'. Together they form a unique fingerprint.

Cite this