Estimating p-values in small microarray experiments.

Bayes-Theorem, Computer-Simulation, False-Positive-Reactions, Gene-Expression-Profiling, Genetic-Screening, Models-Genetic, Models-Statistical, Oligonucleotide-Array-Sequence-Analysis, Sensitivity-and-Specificity

Bioinformatics 2007 Jan; 23(1):38-43.


MOTIVATION: Microarray data typically have small numbers of observations per gene, which can result in low power for statistical tests. Test statistics that borrow information from data across all of the genes can improve power, but these statistics have non-standard distributions, and their significance must be assessed using permutation analysis. When sample sizes are small, the number of distinct permutations can be severely limited, and pooling the permutation-derived test statistics across all genes has been proposed. However, the null distribution of the test statistics under permutation is not the same for equally and differentially expressed genes. This can have a negative impact on both p-value estimation and the power of information borrowing statistics. RESULTS: We investigate permutation based methods for estimating p-values. One of methods that uses pooling from a selected subset of the data are shown to have the correct type I error rate and to provide accurate estimates of the false discovery rate (FDR). We provide guidelines to select an appropriate subset. We also demonstrate that information borrowing statistics have substantially increased power compared to the t-test in small experiments.