Variance estimation in the analysis of microarray data

Yuedong Wang, Yanyuan Ma, Raymond J. Carroll

Research output: Contribution to journalArticlepeer-review

19 Scopus citations


Microarrays are one of the most widely used high throughput technologies. One of the main problems in the area is that conventional estimates of the variances that are required in the t-statistic and other statistics are unreliable owing to the small number of replications. Various methods have been proposed in the literature to overcome this lack of degrees of freedom problem. In this context, it is commonly observed that the variance increases proportionally with the intensity level, which has led many researchers to assume that the variance is a function of the mean. Here we concentrate on estimation of the variance as a function of an unknown mean in two models: the constant coefficient of variation model and the quadratic variance-mean model. Because the means are unknown and estimated with few degrees of freedom, naive methods that use the sample mean in place of the true mean are generally biased because of the errors-in-variables phenomenon. We propose three methods for overcoming this bias. The first two are variations on the theme of the so-called heteroscedastic simulation-extrapolation estimator, modified to estimate the variance function consistently. The third class of estimators is entirely different, being based on semiparametric information calculations. Simulations show the power of our methods and their lack of bias compared with the naive method that ignores the measurement error. The methodology is illustrated by using microarray data from leukaemia patients.
Original languageEnglish (US)
Pages (from-to)425-445
Number of pages21
JournalJournal of the Royal Statistical Society: Series B (Statistical Methodology)
Issue number2
StatePublished - Apr 2009
Externally publishedYes

Bibliographical note

KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): KUS-CI-016-04
Acknowledgements: Wang's research was supported by a grant from the National Science Foundation (DMS-0706886). Ma's research was supported by the National Science Foundation of Switzerland. Carroll's research was supported by grants from the National Cancer Institute (CA-57030 and CA104620). Carroll's research was supported by grants from the National Cancer Institute (CA57030 and CA104620). Part of the work was based on work supported by award KUS-CI-016-04, made by King Abdullah University of Science and Technology.We thank Dr Strimmer for sending us the leukaemia data. We also thank the Joint Editor, Associate Editor and two referees for constructive comments that substantially improved an earlier draft.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.


Dive into the research topics of 'Variance estimation in the analysis of microarray data'. Together they form a unique fingerprint.

Cite this