Learning biologically-interpretable latent representations for gene expression data: Pathway Activity Score Learning Algorithm

Ioulia Karagiannaki, Krystallia Gourlia, Vincenzo Lagani, Yannis Pantazis, Ioannis Tsamardinos

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Molecular gene-expression datasets consist of samples with tens of thousands of measured quantities (i.e., high dimensional data). However, lower-dimensional representations that retain the useful biological information do exist. We present a novel algorithm for such dimensionality reduction called Pathway Activity Score Learning (PASL). The major novelty of PASL is that the constructed features directly correspond to known molecular pathways (genesets in general) and can be interpreted as pathway activity scores. Hence, unlike PCA and similar methods, PASL’s latent space has a fairly straightforward biological interpretation. PASL is shown to outperform in predictive performance the state-of-the-art method (PLIER) on two collections of breast cancer and leukemia gene expression datasets. PASL is also trained on a large corpus of 50000 gene expression samples to construct a universal dictionary of features across different tissues and pathologies. The dictionary validated on 35643 held-out samples for reconstruction error. It is then applied on 165 held-out datasets spanning a diverse range of diseases. The AutoML tool JADBio is employed to show that the predictive information in the PASL-created feature space is retained after the transformation. The code is available at https://github.com/mensxmachina/PASL.
Original languageEnglish (US)
JournalMachine Learning
DOIs
StatePublished - Jan 1 2022
Externally publishedYes

Bibliographical note

Generated from Scopus record by KAUST IRTS on 2023-09-23

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software

Fingerprint

Dive into the research topics of 'Learning biologically-interpretable latent representations for gene expression data: Pathway Activity Score Learning Algorithm'. Together they form a unique fingerprint.

Cite this