Abstract
We introduce a new cross-modal fusion technique designed for generative error correction in automatic speech recognition (ASR). Our methodology leverages both acoustic information and external linguistic representations to generate accurate speech transcription contexts. This marks a step towards a fresh paradigm in generative error correction within the realm of n-best hypotheses. Unlike the existing ranking-based rescoring methods, our approach adeptly uses distinct initialization techniques and parameter-efficient algorithms to boost ASR performance derived from pre-trained speech and text models. Through evaluation across diverse ASR datasets, we assess our fusion technique, demonstrating a 37.66% improvement in word error rate (WER) relative performance compared to the n-best Oracle. To encourage future research, we have made our code and pre-trained models open source at https://github.com/Srijith-rkr/Whispering-LLaMA.
Original language | English (US) |
---|---|
Title of host publication | EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings |
Editors | Houda Bouamor, Juan Pino, Kalika Bali |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 10007-10016 |
Number of pages | 10 |
ISBN (Electronic) | 9798891760608 |
State | Published - 2023 |
Event | 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - Hybrid, Singapore, Singapore Duration: Dec 6 2023 → Dec 10 2023 |
Publication series
Name | EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings |
---|
Conference
Conference | 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 |
---|---|
Country/Territory | Singapore |
City | Hybrid, Singapore |
Period | 12/6/23 → 12/10/23 |
Bibliographical note
Publisher Copyright:© 2023 Association for Computational Linguistics.
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Computer Science Applications
- Information Systems
- Linguistics and Language