Feature selection for high-dimensional integrated data

Charles Zheng, Scott Schwartz, Robert S. Chapkin, Raymond J. Carroll, Ivan Ivanov

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Scopus citations

Abstract

Motivated by the problem of identifying correlations between genes or features of two related biological systems, we propose a model of feature selection in which only a subset of the predictors Xt are dependent on the multidimensional variate Y, and the remainder of the predictors constitute a “noise set” Xu independent of Y. Using Monte Carlo simulations, we investigated the relative performance of two methods: thresholding and singular-value decomposition, in combination with stochastic optimization to determine “empirical bounds” on the small-sample accuracy of an asymptotic approximation. We demonstrate utility of the thresholding and SVD feature selection methods to with respect to a recent infant intestinal gene expression and metagenomics dataset.
Original languageEnglish (US)
Title of host publicationProceedings of the 2012 SIAM International Conference on Data Mining
PublisherSociety for Industrial & Applied Mathematics (SIAM)
ISBN (Print)9781611972320
DOIs
StatePublished - Dec 18 2013
Externally publishedYes

Bibliographical note

KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): KUS-C1-016-04
Acknowledgements: We are indebted to the Texas A& M Brazos Computing Cluster and Institute of Developmentaland Molecular Biology for access to computingresources, and to professors David B. Dahl,Mohsen Pourahmadi, and Joel Zinn for helpful discussions.The infant microarray-metagenomics data wasprovided courtesy of Sharon M. Donovan, of the Divisionof Nutritional Sciences, U. of Illinois, Urbana, IL.This publication is based in part on work supported byAward No. KUS-C1-016-04, made by King AbdullahUniversity of Science and Technology (KAUST).
This publication acknowledges KAUST support, but has no KAUST affiliated authors.

Fingerprint

Dive into the research topics of 'Feature selection for high-dimensional integrated data'. Together they form a unique fingerprint.

Cite this