TY - JOUR
T1 - Highlighting nonlinear patterns in population genetics datasets
AU - Alanis Lobato, Gregorio
AU - Cannistraci, Carlo Vittorio
AU - Eriksson, Anders
AU - Manica, Andrea
AU - Ravasi, Timothy
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2015/1/30
Y1 - 2015/1/30
N2 - Detecting structure in population genetics and case-control studies is important, as it exposes phenomena such as ecoclines, admixture and stratification. Principal Component Analysis (PCA) is a linear dimension-reduction technique commonly used for this purpose, but it struggles to reveal complex, nonlinear data patterns. In this paper we introduce non-centred Minimum Curvilinear Embedding (ncMCE), a nonlinear method to overcome this problem. Our analyses show that ncMCE can separate individuals into ethnic groups in cases in which PCA fails to reveal any clear structure. This increased discrimination power arises from ncMCE's ability to better capture the phylogenetic signal in the samples, whereas PCA better reflects their geographic relation. We also demonstrate how ncMCE can discover interesting patterns, even when the data has been poorly pre-processed. The juxtaposition of PCA and ncMCE visualisations provides a new standard of analysis with utility for discovering and validating significant linear/nonlinear complementary patterns in genetic data.
AB - Detecting structure in population genetics and case-control studies is important, as it exposes phenomena such as ecoclines, admixture and stratification. Principal Component Analysis (PCA) is a linear dimension-reduction technique commonly used for this purpose, but it struggles to reveal complex, nonlinear data patterns. In this paper we introduce non-centred Minimum Curvilinear Embedding (ncMCE), a nonlinear method to overcome this problem. Our analyses show that ncMCE can separate individuals into ethnic groups in cases in which PCA fails to reveal any clear structure. This increased discrimination power arises from ncMCE's ability to better capture the phylogenetic signal in the samples, whereas PCA better reflects their geographic relation. We also demonstrate how ncMCE can discover interesting patterns, even when the data has been poorly pre-processed. The juxtaposition of PCA and ncMCE visualisations provides a new standard of analysis with utility for discovering and validating significant linear/nonlinear complementary patterns in genetic data.
UR - http://hdl.handle.net/10754/344117
UR - http://www.nature.com/doifinder/10.1038/srep08140
UR - http://www.scopus.com/inward/record.url?scp=84944779923&partnerID=8YFLogxK
U2 - 10.1038/srep08140
DO - 10.1038/srep08140
M3 - Article
C2 - 25633916
SN - 2045-2322
VL - 5
JO - Scientific Reports
JF - Scientific Reports
IS - 1
ER -