Curating Reagents in Chemical Reaction Data with an Interactive Reagent Space Map

Mikhail Andronov*, Natalia Andronova, Michael Wand, Jürgen Schmidhuber, Djork Arné Clevert

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The increasing use of machine learning and artificial intelligence in chemical reaction studies demands high-quality reaction data, necessitating specialized tools enabling data understanding and curation. Our work introduces a novel methodology for reaction data examination centered on reagents - essential molecules in reactions that do not contribute atoms to products. We propose an intuitive tool for creating interactive reagent space maps using distributed vector representations, akin to word2vec in Natural Language Processing, capturing the statistics of reagent usage within datasets. Our approach enables swift assessment of reagent action patterns and identification of erroneous reagent entries, which we demonstrate using the USPTO dataset. Our contributions include an open-source web application for visual reagent pattern analysis and a table cataloging around six hundred of the most frequent reagents in USPTO annotated with detailed roles. Our method aims to support organic chemists and cheminformatics experts in reaction data curation routine.

Original languageEnglish (US)
Title of host publicationAI in Drug Discovery - 1st International Workshop, AIDD 2024, Held in Conjunction with ICANN 2024, Proceedings
EditorsDjork-Arné Clevert, Michael Wand, Jürgen Schmidhuber, Kristína Malinovská, Igor V. Tetko
PublisherSpringer Science and Business Media Deutschland GmbH
Pages21-35
Number of pages15
ISBN (Print)9783031723803
DOIs
StatePublished - 2025
Event1st International Workshop on AI in Drug Discovery, AIDD 2024, held as a part of the 33rd International Conference on Artificial Neural Networks, ICANN 2024 - Lugano, Switzerland
Duration: Sep 19 2024Sep 19 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14894 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st International Workshop on AI in Drug Discovery, AIDD 2024, held as a part of the 33rd International Conference on Artificial Neural Networks, ICANN 2024
Country/TerritorySwitzerland
CityLugano
Period09/19/2409/19/24

Bibliographical note

Publisher Copyright:
© The Author(s) 2025.

Keywords

  • Chemical data curation
  • Reagents
  • USPTO
  • word2vec

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Curating Reagents in Chemical Reaction Data with an Interactive Reagent Space Map'. Together they form a unique fingerprint.

Cite this