Significance tests for analyzing gene expression data with small sample sizes
Journal article
Ullah, Insha, Paul, Sudhir, Hong, Zhenjie and Wang, You-Gan. (2019). Significance tests for analyzing gene expression data with small sample sizes. Bioinformatics. 35(20), pp. 3996-4003. https://doi.org/10.1093/bioinformatics/btz189
Authors | Ullah, Insha, Paul, Sudhir, Hong, Zhenjie and Wang, You-Gan |
---|---|
Abstract | Motivation: Under two biologically different conditions, we are often interested in identifying differentially expressed genes. It is usually the case that the assumption of equal variances on the two groups is violated for many genes where a large number of them are required to be filtered or ranked. In these cases, exact tests are unavailable and the Welch's approximate test is most reliable one. The Welch's test involves two layers of approximations: approximating the distribution of the statistic by a t-distribution, which in turn depends on approximate degrees of freedom. This study attempts to improve upon Welch's approximate test by avoiding one layer of approximation. Results: We introduce a new distribution that generalizes the t-distribution and propose a Monte Carlo based test that uses only one layer of approximation for statistical inferences. Experimental results based on extensive simulation studies show that the Monte Carol based tests enhance the statistical power and performs better than Welch's t-approximation, especially when the equal variance assumption is not met and the sample size of the sample with a larger variance is smaller. We analyzed two gene-expression datasets, namely the childhood acute lymphoblastic leukemia gene-expression dataset with 22 283 genes and Golden Spike dataset produced by a controlled experiment with 13 966 genes. The new test identified additional genes of interest in both datasets. Some of these genes have been proven to play important roles in medical literature. Availability and implementation: R scripts and the R package mcBFtest is available in CRAN and to reproduce all reported results are available at the GitHub repository, https://github.com/iullah1980/MCTcodes. Supplementary information: Supplementary data is available at Bioinformatics online. |
Keywords | Biometry; Gene Expression; Monte Carlo Method; Sample Size; Statistical Distributions |
Year | 01 Jan 2019 |
Journal | Bioinformatics |
Journal citation | 35 (20), pp. 3996-4003 |
Publisher | Oxford University Press |
ISSN | 1367-4803 |
Digital Object Identifier (DOI) | https://doi.org/10.1093/bioinformatics/btz189 |
PubMed ID | 30874796 |
Web address (URL) | https://academic.oup.com/bioinformatics/article/35/20/3996/5381541 |
Open access | Published as non-open access |
Research or scholarly | Research |
Page range | 3996-4003 |
Publisher's version | License All rights reserved File Access Level Controlled |
Output status | Published |
Publication dates | |
Online | 15 Mar 2019 |
Publication process dates | |
Accepted | 13 Mar 2019 |
Deposited | 11 Jan 2023 |
Additional information | © The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/...) |
Place of publication | United Kingdom |
https://acuresearchbank.acu.edu.au/item/8y92y/significance-tests-for-analyzing-gene-expression-data-with-small-sample-sizes
Restricted files
Publisher's version
65
total views0
total downloads1
views this month0
downloads this month