Abstract
The increasing use of machine learning and artificial intelligence in chemical reaction studies demands high-quality reaction data, necessitating specialized tools enabling data understanding and curation. Our work introduces a novel methodology for reaction data examination centered on reagents - essential molecules in reactions that do not contribute atoms to products. We propose an intuitive tool for creating interactive reagent space maps using distributed vector representations, akin to word2vec in Natural Language Processing, capturing the statistics of reagent usage within datasets. Our approach enables swift assessment of reagent action patterns and identification of erroneous reagent entries, which we demonstrate using the USPTO dataset. Our contributions include an open-source web application for visual reagent pattern analysis and a table cataloging around six hundred of the most frequent reagents in USPTO annotated with detailed roles. Our method aims to support organic chemists and cheminformatics experts in reaction data curation routine.
Original language | English (US) |
---|---|
Title of host publication | AI in Drug Discovery - 1st International Workshop, AIDD 2024, Held in Conjunction with ICANN 2024, Proceedings |
Editors | Djork-Arné Clevert, Michael Wand, Jürgen Schmidhuber, Kristína Malinovská, Igor V. Tetko |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 21-35 |
Number of pages | 15 |
ISBN (Print) | 9783031723803 |
DOIs | |
State | Published - 2025 |
Event | 1st International Workshop on AI in Drug Discovery, AIDD 2024, held as a part of the 33rd International Conference on Artificial Neural Networks, ICANN 2024 - Lugano, Switzerland Duration: Sep 19 2024 → Sep 19 2024 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 14894 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 1st International Workshop on AI in Drug Discovery, AIDD 2024, held as a part of the 33rd International Conference on Artificial Neural Networks, ICANN 2024 |
---|---|
Country/Territory | Switzerland |
City | Lugano |
Period | 09/19/24 → 09/19/24 |
Bibliographical note
Publisher Copyright:© The Author(s) 2025.
Keywords
- Chemical data curation
- Reagents
- USPTO
- word2vec
ASJC Scopus subject areas
- Theoretical Computer Science
- General Computer Science