Latent Feature Representations for Human Gene Expression Data Improve Phenotypic Predictions

Yannis Pantazis, Christos Tselas, Kleanthi Lakiotaki, Vincenzo Lagani, Ioannis Tsamardinos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations


High-throughput technologies such as microarrays and RNA-sequencing (RNA-seq) allow to precisely quantify transcriptomic profiles, generating datasets that are inevitably high-dimensional. In this work, we investigate whether the whole human transcriptome can be represented in a compressed, low dimensional latent space without loosing relevant information. We thus constructed low-dimensional latent feature spaces of the human genome, by utilizing three dimensionality reduction approaches and a diverse set of curated datasets. We applied standard Principal Component Analysis (PCA), kernel PCA and Autoencoder Neural Networks on 1360 datasets from four different measurement technologies. The latent feature spaces are tested for their ability to (a) reconstruct the original data and (b) improve predictive performance on validation datasets not used during the creation of the feature space. While linear techniques show better reconstruction performance, nonlinear approaches, particularly, neural-based models seem to be able to capture non-additive interaction effects, and thus enjoy stronger predictive capabilities. Despite the limited sample size of each dataset and the biological / technological heterogeneity across studies, our results show that low dimensional representations of the human transcriptome can be achieved by integrating hundreds of datasets. The created space is two to three orders of magnitude smaller compared to the raw data, offering the ability of capturing a large portion of the original data variability and eventually reducing computational time for downstream analyses.
Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages8
ISBN (Print)9781728162157
StatePublished - Dec 16 2020
Externally publishedYes

Bibliographical note

Generated from Scopus record by KAUST IRTS on 2023-09-23


Dive into the research topics of 'Latent Feature Representations for Human Gene Expression Data Improve Phenotypic Predictions'. Together they form a unique fingerprint.

Cite this