Abstract
Motivation: Genome Wide Association Studies (GWAS) present several computational and statistical challenges for their data analysis, including knowledge discovery, interpretability, and translation to clinical practice.
Results: We develop, apply, and comparatively evaluate an Automated Machine Learning (AutoML) approach, customized for genomic data that delivers reliable predictive and diagnostic models, the set of genetic variants that are important for predictions (called a biosignature), and an estimate of the out-of-sample predictive power. This AutoML approach discovers variants with higher predictive performance compared to standard GWAS methods, computes an individual risk prediction score, generalizes to new, unseen data, is shown to better differentiate causal variants from other highly correlated variants, and enhances knowledge discovery and interpretability by reporting multiple equivalent biosignatures.
Original language | English (US) |
---|---|
Journal | Bioinformatics (Oxford, England) |
DOIs | |
State | Published - Sep 6 2023 |
Bibliographical note
KAUST Repository Item: Exported on 2023-09-08Acknowledgements: The research work was supported by the European Research Council (ERC) under the European Union’s Seventh Framework Programme (FP/2007–2013) (grant agreement no 617393), the METALASSO project, which is co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH– CREATE– INNOVATE (project code: T1EDK-04347) and the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment grant” (Project Number: 1941). This study makes use of data generated by the Wellcome Trust Case-Control Consortium. A full list of the investigators who contributed to the generation of the data is available from: www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113, 085475 and 090355. We sincerely thank Professor Ioanna Tzoulaki for comments on the manuscript; Professors George Dedousis and Pavlos Pavlidis for fruitful discussions, Elissavet Greasidou for her help in data acquisition and cleaning. Several members of our mensxmachina research group for useful comments and Glykeria Fragioudaki for her administrative help on data access.
ASJC Scopus subject areas
- Biochemistry
- Computational Theory and Mathematics
- Computational Mathematics
- Molecular Biology
- Statistics and Probability
- Computer Science Applications