Real World Music Object Recognition

Lukas Tuggener*, Simon Goldschagg, Raphael Emberger, Florian Seibold, Adhiraj Ghosh, Urs Gut, Pascal Sager, Philipp Ackermann, Yvan Putra Satyawan, Jürgen Schmidhuber, Javier Montoya, Thilo Stadelmann

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

We present solutions to two of the most pressing issues in contemporary optical music recognition (OMR). We improve recognition accuracy on low-quality, real-world (i.e. containing ageing, lighting, or dirt artefacts among others) input data and provide confidence-rated model outputs to enable efficient human post-processing. Specifically, we present (i) a sophisticated input augmentation scheme that can reduce the gap between sanitised benchmarks and realistic tasks through a combination of synthetic data and noisy perturbations of real-world documents; (ii) an adversarial discriminative domain adaptation method that can be employed to improve the performance of OMR systems on low-quality data; (iii) a combination of model ensembles and prediction fusion, which generates trustworthy confidence ratings for each prediction. We evaluate our contributions on a newly created test set consisting of manually annotated pages of varying real-world quality, sourced from the International Music Score Library Project (IMSLP)/Petrucci Music Library. With the presented data augmentation scheme, we achieve a doubling in detection performance from 36.0% to 73.3% on noisy real-world data compared to state-of-the-art training. This result is then combined with robust confidence ratings paving the way for OMR to be deployed in the real world. Additionally, we show the merits of unsupervised adversarial domain adaptation for OMR raising the 36.0% baseline to 48.9%.

Original languageEnglish (US)
Pages (from-to)1-14
Number of pages14
JournalTransactions of the International Society for Music Information Retrieval
Volume7
Issue number1
DOIs
StatePublished - 2024

Bibliographical note

Publisher Copyright:
© 2024 Ubiquity Press. All rights reserved.

Keywords

  • Adversarial Training
  • Data Augmentation
  • Deep Learning
  • Model Ensembles
  • Open Data
  • Optical Music Recognition

ASJC Scopus subject areas

  • Music
  • Linguistics and Language
  • Museology
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'Real World Music Object Recognition'. Together they form a unique fingerprint.

Cite this