Significance tests for analyzing gene expression data with small sample sizes

Journal article


Ullah, Insha, Paul, Sudhir, Hong, Zhenjie and Wang, You-Gan. (2019). Significance tests for analyzing gene expression data with small sample sizes. Bioinformatics. 35(20), pp. 3996-4003. https://doi.org/10.1093/bioinformatics/btz189
AuthorsUllah, Insha, Paul, Sudhir, Hong, Zhenjie and Wang, You-Gan
Abstract

Motivation: Under two biologically different conditions, we are often interested in identifying differentially expressed genes. It is usually the case that the assumption of equal variances on the two groups is violated for many genes where a large number of them are required to be filtered or ranked. In these cases, exact tests are unavailable and the Welch's approximate test is most reliable one. The Welch's test involves two layers of approximations: approximating the distribution of the statistic by a t-distribution, which in turn depends on approximate degrees of freedom. This study attempts to improve upon Welch's approximate test by avoiding one layer of approximation.

Results: We introduce a new distribution that generalizes the t-distribution and propose a Monte Carlo based test that uses only one layer of approximation for statistical inferences. Experimental results based on extensive simulation studies show that the Monte Carol based tests enhance the statistical power and performs better than Welch's t-approximation, especially when the equal variance assumption is not met and the sample size of the sample with a larger variance is smaller. We analyzed two gene-expression datasets, namely the childhood acute lymphoblastic leukemia gene-expression dataset with 22 283 genes and Golden Spike dataset produced by a controlled experiment with 13 966 genes. The new test identified additional genes of interest in both datasets. Some of these genes have been proven to play important roles in medical literature.

Availability and implementation: R scripts and the R package mcBFtest is available in CRAN and to reproduce all reported results are available at the GitHub repository, https://github.com/iullah1980/MCTcodes.

Supplementary information: Supplementary data is available at Bioinformatics online.

KeywordsBiometry; Gene Expression; Monte Carlo Method; Sample Size; Statistical Distributions
Year01 Jan 2019
JournalBioinformatics
Journal citation35 (20), pp. 3996-4003
PublisherOxford University Press
ISSN1367-4803
Digital Object Identifier (DOI)https://doi.org/10.1093/bioinformatics/btz189
PubMed ID30874796
Web address (URL)https://academic.oup.com/bioinformatics/article/35/20/3996/5381541
Open accessPublished as non-open access
Research or scholarlyResearch
Page range3996-4003
Publisher's version
License
All rights reserved
File Access Level
Controlled
Output statusPublished
Publication dates
Online15 Mar 2019
Publication process dates
Accepted13 Mar 2019
Deposited11 Jan 2023
Additional information

© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/...)

Place of publicationUnited Kingdom
Permalink -

https://acuresearchbank.acu.edu.au/item/8y92y/significance-tests-for-analyzing-gene-expression-data-with-small-sample-sizes

Restricted files

Publisher's version

  • 12
    total views
  • 0
    total downloads
  • 1
    views this month
  • 0
    downloads this month
These values are for the period from 19th October 2020, when this repository was created.

Export as

Related outputs

Energy-efficient virtual machine placement in data centres via an accelerated Genetic Algorithm with improved fitness computation
Hormozi, Elham, Hu, Shuwen, Ding, Zhe, Tian, Yu-Chu, Wang, You-Gan, Yu, Zu-Guo and Zhang, Weizhe. (2022). Energy-efficient virtual machine placement in data centres via an accelerated Genetic Algorithm with improved fitness computation. Energy. 252, pp. 1-15. https://doi.org/10.1016/j.energy.2022.123884
A physics-informed statistical learning framework for forecasting local suspended sediment concentrations in marine environment
Zhang, Shaotong, Wu, Ryan, Wang, You-Gan, Jeng, Dong-Sheng and Li, Guangxue. (2022). A physics-informed statistical learning framework for forecasting local suspended sediment concentrations in marine environment. Water Research. 218, pp. 1-16. https://doi.org/10.1016/j.watres.2022.118518
Robustified extreme learning machine regression with applications in outlier-blended wind-speed forecasting
Yang, Yang, Zhou, Hu, Wu, Ryan, Ding, Zhe and Wang, You-Gan. (2022). Robustified extreme learning machine regression with applications in outlier-blended wind-speed forecasting. Applied Soft Computing. 122, pp. 1-14. https://doi.org/10.1016/j.asoc.2022.108814
An opposition learning and spiral modelling based arithmetic optimization algorithm for global continuous optimization problems
Yang, Yang, Gao, Yuchao, Tan, Shuang, Zhao, Shangrui, Wu, Jinran, Gao, Shangce, Zhang, Tengfei, Tian, Yu-Chu and Wang, You-Gan. (2022). An opposition learning and spiral modelling based arithmetic optimization algorithm for global continuous optimization problems. Engineering Applications of Artificial Intelligence. 113, p. Article 104981. https://doi.org/10.1016/j.engappai.2022.104981
A modified memetic algorithm with an application to gene selection in a sheep body weight study
Miao, Maoxuan, Wu, Jinran, Cai, Fengjing and Wang, You-Gan. (2022). A modified memetic algorithm with an application to gene selection in a sheep body weight study. Animals. 12(2), p. Article 201. https://doi.org/10.3390/ani12020201
Packing computing servers into the vessel of an underwater data center considering cooling efficiency
Hu, Zhi-Hua, Zheng, Yu-Xin and Wang, You-Gan. (2022). Packing computing servers into the vessel of an underwater data center considering cooling efficiency. Applied Energy. 314, p. Article 118986. https://doi.org/10.1016/j.apenergy.2022.118986
A robust and efficient variable selection method for linear regression
Yang, Zhuoran, Fu, Liya, Wang, You-Gan, Dong, Zhixiong and Jiang, Yunlu. (2021). A robust and efficient variable selection method for linear regression. Journal of Applied Statistics. 49(14), pp. 3677-3692. https://doi.org/10.1080/02664763.2021.1962259
Robust regression with asymmetric loss functions
Fu, Liya and Wang, You-Gan. (2021). Robust regression with asymmetric loss functions. Statistical Methods in Medical Research. 30(8), pp. 1800-1815. https://doi.org/10.1177/09622802211012012
A temporal LASSO regression model for the emergency forecasting of the suspended sediment concentrations in coastal oceans: Accuracy and interpretability
Zhang, Shaotong, Wu, Ryan, Jia, Yonggang, Wang, You-Gan, Zhang, Yaqi and Duan, Qibin. (2021). A temporal LASSO regression model for the emergency forecasting of the suspended sediment concentrations in coastal oceans: Accuracy and interpretability. Engineering Applications of Artificial Intelligence. 100, pp. 1-13. https://doi.org/10.1016/j.engappai.2021.104206
Robust approach for variable selection with high dimensional longitudinal data analysis
Fu, Liya, Li, Jiaqi and Wang, You-Gan. (2021). Robust approach for variable selection with high dimensional longitudinal data analysis. Statistics in Medicine. 40(30), pp. 6835-6854. https://doi.org/10.1002/sim.9213
Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis
Fu, Liya, Yang, Zhuoran, Cai, Fengjing and Wang, You-Gan. (2021). Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis. Computational Statistics. 36(2), pp. 781-804. https://doi.org/10.1007/s00180-020-01038-3
Predictive regression with p-lags and order-q autoregressive predictors
Jayetileke, Harshanie L., Wang, You-Gan and Zhu, Min. (2021). Predictive regression with p-lags and order-q autoregressive predictors. Journal of Empirical Finance. 62, pp. 282-293. https://doi.org/10.1016/j.jempfin.2021.04.006
An efficient Gehan-type estimation for the accelerated failure time model with clustered and censored data
Fu, Liya, Yang, Zhuoran, Zhou, Yan and Wang, You-Gan. (2021). An efficient Gehan-type estimation for the accelerated failure time model with clustered and censored data. Lifetime Data Analysis. 27(4), pp. 679-709. https://doi.org/10.1007/s10985-021-09526-4
Robust estimation procedure for autoregressive models with heterogeneity
Callens, A., Wang, Y.-G., Fu, L. and Liquet, B.. (2021). Robust estimation procedure for autoregressive models with heterogeneity. Environmental Modeling and Assessment. 26(3), pp. 313-323. https://doi.org/10.1007/s10666-020-09730-w
Influential factors on Chinese airlines’ profitability and forecasting methods
Xu, Xu, McGrory, Clare Anne, Wang, You-Gan and Wu, Jinran. (2021). Influential factors on Chinese airlines’ profitability and forecasting methods. Journal of Air Transport Management. 91, p. Article 101969. https://doi.org/10.1016/j.jairtraman.2020.101969
Support vector regression with asymmetric loss for optimal electric load forecasting
Wu, Ryan, Wang, You-Gan, Tian, Yu-Chu, Burrage, Kevin and Cao, Taoyun. (2021). Support vector regression with asymmetric loss for optimal electric load forecasting. Energy. 223, p. Article 119969. https://doi.org/10.1016/j.energy.2021.119969
Exact algorithms for energy-efficient virtual machine placement in data centers
Wei, Chen, Hu, Zhi-Hua and Wang, You-Gan. (2020). Exact algorithms for energy-efficient virtual machine placement in data centers. Future Generation Computer Systems. 106, pp. 77-91. https://doi.org/10.1016/j.future.2019.12.043
A working likelihood approach for robust regression
Fu, Liya, Wang, You-Gan and Cai, Fengjing. (2020). A working likelihood approach for robust regression. Statistical Methods in Medical Research. 29(12), pp. 3641-3652. https://doi.org/10.1177/0962280220936310
Maritime convection and fluctuation between Vietnam and China : A data-driven study
Hu, Zhi-Hua, Liu, Chan-Juan, Chen, Wanting, Wang, You-Gan and Wei, Chen. (2020). Maritime convection and fluctuation between Vietnam and China : A data-driven study. Research in Transportation Business and Management. 34, pp. 1-15. https://doi.org/10.1016/j.rtbm.2019.100414
Identifying barley pan-genome sequence anchors using genetic mapping and machine learning
Gao, Shang, Wu, Ryan, Stiller, Jiri, Zheng, Zhi, Zhou, Meixue, Wang, You-Gan and Liu, Chunji. (2020). Identifying barley pan-genome sequence anchors using genetic mapping and machine learning. Theoretical and Applied Genetics. 133(9), pp. 2535-2544. https://doi.org/10.1007/s00122-020-03615-y
Natural mortality estimation using tree-based ensemble learning models
Liu, Chanjuan, Zhou, Shijie, Wang, You-Gan and Hu, Zhi-Hua. (2020). Natural mortality estimation using tree-based ensemble learning models. ICES Journal of Marine Science. 77(4), pp. 1414-1426. https://doi.org/10.1093/icesjms/fsaa058
Profile-guided three-phase virtual resource management for energy efficiency of data centers
Ding, Zhe, Tian, Yu-Chu, Tang, Maolin, Li, Yuefeng, Wang, You-Gan and Zhou, Chunjie. (2020). Profile-guided three-phase virtual resource management for energy efficiency of data centers. IEEE Transactions on Industrial Electronics. 67(3), pp. 2460-2468. https://doi.org/10.1109/TIE.2019.2902786
Incorporating social objectives in evaluating sustainable fisheries harvest strategy
Wu, Jiafeng, Wang, Na, Hu, Zhi-Hua, Hong, Zhenjie and Wang, You-Gan. (2019). Incorporating social objectives in evaluating sustainable fisheries harvest strategy. Environmental Modeling and Assessment. 24(4), pp. 381-386. https://doi.org/10.1007/s10666-019-9651-9
Robust Estimation Using Modified Huber’s Functions With New Tails
Jiang, Yunlu, Wang, You-Gan, Fu, Liya and Wang, Xueqin. (2019). Robust Estimation Using Modified Huber’s Functions With New Tails. Technometrics. 61(1), pp. 111-122. https://doi.org/10.1080/00401706.2018.1470037
Dividend growth and equity premium predictability
Zhu, Min, Chen, Rui, Du, Ke and Wang, You-Gan. (2018). Dividend growth and equity premium predictability. International Review of Economics and Finance. 56, pp. 125-137. https://doi.org/10.1016/j.iref.2017.10.020
Robust Regression with Data-Dependent Regularization Parameters and Autoregressive Temporal Correlations
Wang, Na, Wang, You-Gan, Hu, Shuwen, Hu, Zhi-Hua, Xu, Jing, Tang, Hongwu and Jin, Guangqiu. (2018). Robust Regression with Data-Dependent Regularization Parameters and Autoregressive Temporal Correlations. Environmental Modeling and Assessment. 23(6), pp. 779-786. https://doi.org/10.1007/s10666-018-9605-7
Analysis of spatial data with a nested correlation structure
Adegboye, Oyelola, Leung, Denis and Wang, You-Gan. (2018). Analysis of spatial data with a nested correlation structure. Journal of the Royal Statistical Society Series C: Applied Statistics. 67(2), pp. 329-354. https://doi.org/10.1111/rssc.12230
Working correlation structure selection in generalized estimating equations
Fu, Liya, Hao, Yangyang and Wang, You-Gan. (2018). Working correlation structure selection in generalized estimating equations. Computational Statistics. 33(2), pp. 983-996. https://doi.org/10.1007/s00180-018-0800-4
Selection of working correlation structure in generalized estimating equations
Wang, You-Gan and Fu, Liya. (2017). Selection of working correlation structure in generalized estimating equations. Statistics in Medicine. 36(14), pp. 2206-2219. https://doi.org/10.1002/sim.7262
Blockwise AICc for model selection in generalized linear models
Song, Guofeng, Dong, Xiaogang, Wu, Jiafeng and Wang, You-Gan. (2017). Blockwise AICc for model selection in generalized linear models. Environmental Modeling and Assessment. 22(6), pp. 523-533. https://doi.org/10.1007/s10666-017-9552-8
A comment on Koh’s “The optimal design of fallible organizations : Invariance of optimal decision threshold and uniqueness of hierarchy and polyarchy structures”
Zhu, Min, Liu, Chang and Wang, You-Gan. (2017). A comment on Koh’s “The optimal design of fallible organizations : Invariance of optimal decision threshold and uniqueness of hierarchy and polyarchy structures”. Social Choice and Welfare. 48(2), pp. 385-392. https://doi.org/10.1007/s00355-016-1009-5
Movement and growth of the coral reef holothuroids Bohadschia argus and Thelenota ananas
Purcell, Steven W., Piddocke, Toby P., Dalton, Steven J. and Wang, You-Gan. (2016). Movement and growth of the coral reef holothuroids Bohadschia argus and Thelenota ananas. Marine Ecology Progress Series. 551, pp. 201-214. https://doi.org/10.3354/meps11720
Improved confidence intervals for the linkage disequilibrium method for estimating effective population size
Jones, A. T., Ovenden, J. R. and Wang, Y.-G.. (2016). Improved confidence intervals for the linkage disequilibrium method for estimating effective population size. Heredity. 117(4), pp. 217-223. https://doi.org/10.1038/hdy.2016.19