A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model

Srijith Radhakrishnan, Chao Han Huck Yang, Sumeer Ahmad Khan, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner

Research output: Contribution to conferencePaperpeer-review

2 Scopus citations

Abstract

In this work, we explore Parameter-Efficient-Learning (PEL) techniques to repurpose a General-Purpose-Speech (GSM) model for Arabic dialect identification (ADI). Specifically, we investigate different setups to incorporate trainable features into a multi-layer encoder-decoder GSM formulation under frozen pre-trained settings. Our architecture includes residual adapter and model reprogramming (input-prompting). We design a token-level label mapping to condition the GSM for Arabic Dialect Identification (ADI). We achieve new state-of-the-art accuracy on the ADI-17 dataset by vanilla fine-tuning. We further reduce the training budgets with the PEL method, which performs within 1.86% accuracy to fine-tuning using only 2.5% of (extra) network trainable parameters. Our study demonstrates how to identify Arabic dialects using a small dataset and limited computation with open source code at https://github.com/Srijith-rkr/KAUST-Whisper-Adapter.

Original languageEnglish (US)
Pages1958-1962
Number of pages5
DOIs
StatePublished - 2023
Event24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: Aug 20 2023Aug 24 2023

Conference

Conference24th International Speech Communication Association, Interspeech 2023
Country/TerritoryIreland
CityDublin
Period08/20/2308/24/23

Bibliographical note

Publisher Copyright:
© 2023 International Speech Communication Association. All rights reserved.

Keywords

  • Arabic Dialect
  • Dialect Identification
  • Parameter-Efficient Learning

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model'. Together they form a unique fingerprint.

Cite this