Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis

Journal article


Fu, Liya, Yang, Zhuoran, Cai, Fengjing and Wang, You-Gan. (2021). Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis. Computational Statistics. 36(2), pp. 781-804. https://doi.org/10.1007/s00180-020-01038-3
AuthorsFu, Liya, Yang, Zhuoran, Cai, Fengjing and Wang, You-Gan
Abstract

New technologies have produced increasingly complex and massive datasets, such as next generation sequencing and microarray data in biology, dynamic treatment regimes in clinical trials and long-term wide-scale studies in the social sciences. Each study exhibits its unique data structure within individuals, clusters and possibly across time and space. In order to draw valid conclusion from such large dimensional data, we must account for intracluster correlations, varying cluster sizes, and outliers in response and/or covariate domains to achieve valid and efficient inferences. A weighted rank-based method is proposed for selecting variables and estimating parameters simultaneously. The main contribution of the proposed method is four fold: (1) variable selection using adaptive lasso is extended to robust rank regression so that protection against outliers in both response and predictor variables is obtained; (2) within-subject correlations are incorporated so that efficiency of parameter estimation is improved; (3) the computation is convenient via the existing function in statistical software R. (4) the proposed method is proved to have desirable asymptotic properties for fixed number of covariates (p). Simulation studies are carried out to evaluate the proposed method for a number of scenarios including the cases when p equals to the number of subjects. The simulation results indicate that the proposed method is efficient and robust. A hormone dataset is analyzed for illustration. By adding additional redundant variables as covariates, the penalty approach and weighting schemes are proven to be effective.

KeywordsCorrelated data; Outliers; Rank-based method; Variable selection
Year01 Jan 2021
JournalComputational Statistics
Journal citation36 (2), pp. 781-804
PublisherSpringer
ISSN0943-4062
Digital Object Identifier (DOI)https://doi.org/10.1007/s00180-020-01038-3
Web address (URL)https://link.springer.com/article/10.1007/s00180-020-01038-3
Open accessPublished as non-open access
Research or scholarlyResearch
Page range781-804
Publisher's version
License
All rights reserved
File Access Level
Controlled
Output statusPublished
Publication dates
Print12 Oct 2020
Publication process dates
Accepted01 Oct 2020
Deposited11 Jan 2023
Additional information

©Springer-Verlag GmbH Germany, part of Springer Nature 2020

Place of publicationGermany
Permalink -

https://acuresearchbank.acu.edu.au/item/8y92v/efficient-and-doubly-robust-methods-for-variable-selection-and-parameter-estimation-in-longitudinal-data-analysis

Restricted files

Publisher's version

  • 3
    total views
  • 0
    total downloads
  • 2
    views this month
  • 0
    downloads this month
These values are for the period from 19th October 2020, when this repository was created.

Export as

Related outputs

Energy-efficient virtual machine placement in data centres via an accelerated Genetic Algorithm with improved fitness computation
Hormozi, Elham, Hu, Shuwen, Ding, Zhe, Tian, Yu-Chu, Wang, You-Gan, Yu, Zu-Guo and Zhang, Weizhe. (2022). Energy-efficient virtual machine placement in data centres via an accelerated Genetic Algorithm with improved fitness computation. Energy. 252, pp. 1-15. https://doi.org/10.1016/j.energy.2022.123884
A physics-informed statistical learning framework for forecasting local suspended sediment concentrations in marine environment
Zhang, Shaotong, Wu, Ryan, Wang, You-Gan, Jeng, Dong-Sheng and Li, Guangxue. (2022). A physics-informed statistical learning framework for forecasting local suspended sediment concentrations in marine environment. Water Research. 218, pp. 1-16. https://doi.org/10.1016/j.watres.2022.118518
Robustified extreme learning machine regression with applications in outlier-blended wind-speed forecasting
Yang, Yang, Zhou, Hu, Wu, Ryan, Ding, Zhe and Wang, You-Gan. (2022). Robustified extreme learning machine regression with applications in outlier-blended wind-speed forecasting. Applied Soft Computing. 122, pp. 1-14. https://doi.org/10.1016/j.asoc.2022.108814
An opposition learning and spiral modelling based arithmetic optimization algorithm for global continuous optimization problems
Yang, Yang, Gao, Yuchao, Tan, Shuang, Zhao, Shangrui, Wu, Jinran, Gao, Shangce, Zhang, Tengfei, Tian, Yu-Chu and Wang, You-Gan. (2022). An opposition learning and spiral modelling based arithmetic optimization algorithm for global continuous optimization problems. Engineering Applications of Artificial Intelligence. 113, p. Article 104981. https://doi.org/10.1016/j.engappai.2022.104981
A modified memetic algorithm with an application to gene selection in a sheep body weight study
Miao, Maoxuan, Wu, Jinran, Cai, Fengjing and Wang, You-Gan. (2022). A modified memetic algorithm with an application to gene selection in a sheep body weight study. Animals. 12(2), p. Article 201. https://doi.org/10.3390/ani12020201
Packing computing servers into the vessel of an underwater data center considering cooling efficiency
Hu, Zhi-Hua, Zheng, Yu-Xin and Wang, You-Gan. (2022). Packing computing servers into the vessel of an underwater data center considering cooling efficiency. Applied Energy. 314, p. Article 118986. https://doi.org/10.1016/j.apenergy.2022.118986
A robust and efficient variable selection method for linear regression
Yang, Zhuoran, Fu, Liya, Wang, You-Gan, Dong, Zhixiong and Jiang, Yunlu. (2021). A robust and efficient variable selection method for linear regression. Journal of Applied Statistics. 49(14), pp. 3677-3692. https://doi.org/10.1080/02664763.2021.1962259
Robust regression with asymmetric loss functions
Fu, Liya and Wang, You-Gan. (2021). Robust regression with asymmetric loss functions. Statistical Methods in Medical Research. 30(8), pp. 1800-1815. https://doi.org/10.1177/09622802211012012
A temporal LASSO regression model for the emergency forecasting of the suspended sediment concentrations in coastal oceans: Accuracy and interpretability
Zhang, Shaotong, Wu, Ryan, Jia, Yonggang, Wang, You-Gan, Zhang, Yaqi and Duan, Qibin. (2021). A temporal LASSO regression model for the emergency forecasting of the suspended sediment concentrations in coastal oceans: Accuracy and interpretability. Engineering Applications of Artificial Intelligence. 100, pp. 1-13. https://doi.org/10.1016/j.engappai.2021.104206
Robust approach for variable selection with high dimensional longitudinal data analysis
Fu, Liya, Li, Jiaqi and Wang, You-Gan. (2021). Robust approach for variable selection with high dimensional longitudinal data analysis. Statistics in Medicine. 40(30), pp. 6835-6854. https://doi.org/10.1002/sim.9213
Predictive regression with p-lags and order-q autoregressive predictors
Jayetileke, Harshanie L., Wang, You-Gan and Zhu, Min. (2021). Predictive regression with p-lags and order-q autoregressive predictors. Journal of Empirical Finance. 62, pp. 282-293. https://doi.org/10.1016/j.jempfin.2021.04.006
An efficient Gehan-type estimation for the accelerated failure time model with clustered and censored data
Fu, Liya, Yang, Zhuoran, Zhou, Yan and Wang, You-Gan. (2021). An efficient Gehan-type estimation for the accelerated failure time model with clustered and censored data. Lifetime Data Analysis. 27(4), pp. 679-709. https://doi.org/10.1007/s10985-021-09526-4
Robust estimation procedure for autoregressive models with heterogeneity
Callens, A., Wang, Y.-G., Fu, L. and Liquet, B.. (2021). Robust estimation procedure for autoregressive models with heterogeneity. Environmental Modeling and Assessment. 26(3), pp. 313-323. https://doi.org/10.1007/s10666-020-09730-w
Influential factors on Chinese airlines’ profitability and forecasting methods
Xu, Xu, McGrory, Clare Anne, Wang, You-Gan and Wu, Jinran. (2021). Influential factors on Chinese airlines’ profitability and forecasting methods. Journal of Air Transport Management. 91, p. Article 101969. https://doi.org/10.1016/j.jairtraman.2020.101969
Support vector regression with asymmetric loss for optimal electric load forecasting
Wu, Ryan, Wang, You-Gan, Tian, Yu-Chu, Burrage, Kevin and Cao, Taoyun. (2021). Support vector regression with asymmetric loss for optimal electric load forecasting. Energy. 223, p. Article 119969. https://doi.org/10.1016/j.energy.2021.119969
Exact algorithms for energy-efficient virtual machine placement in data centers
Wei, Chen, Hu, Zhi-Hua and Wang, You-Gan. (2020). Exact algorithms for energy-efficient virtual machine placement in data centers. Future Generation Computer Systems. 106, pp. 77-91. https://doi.org/10.1016/j.future.2019.12.043
A working likelihood approach for robust regression
Fu, Liya, Wang, You-Gan and Cai, Fengjing. (2020). A working likelihood approach for robust regression. Statistical Methods in Medical Research. 29(12), pp. 3641-3652. https://doi.org/10.1177/0962280220936310
Maritime convection and fluctuation between Vietnam and China : A data-driven study
Hu, Zhi-Hua, Liu, Chan-Juan, Chen, Wanting, Wang, You-Gan and Wei, Chen. (2020). Maritime convection and fluctuation between Vietnam and China : A data-driven study. Research in Transportation Business and Management. 34, pp. 1-15. https://doi.org/10.1016/j.rtbm.2019.100414
Identifying barley pan-genome sequence anchors using genetic mapping and machine learning
Gao, Shang, Wu, Ryan, Stiller, Jiri, Zheng, Zhi, Zhou, Meixue, Wang, You-Gan and Liu, Chunji. (2020). Identifying barley pan-genome sequence anchors using genetic mapping and machine learning. Theoretical and Applied Genetics. 133(9), pp. 2535-2544. https://doi.org/10.1007/s00122-020-03615-y
Natural mortality estimation using tree-based ensemble learning models
Liu, Chanjuan, Zhou, Shijie, Wang, You-Gan and Hu, Zhi-Hua. (2020). Natural mortality estimation using tree-based ensemble learning models. ICES Journal of Marine Science. 77(4), pp. 1414-1426. https://doi.org/10.1093/icesjms/fsaa058
Profile-guided three-phase virtual resource management for energy efficiency of data centers
Ding, Zhe, Tian, Yu-Chu, Tang, Maolin, Li, Yuefeng, Wang, You-Gan and Zhou, Chunjie. (2020). Profile-guided three-phase virtual resource management for energy efficiency of data centers. IEEE Transactions on Industrial Electronics. 67(3), pp. 2460-2468. https://doi.org/10.1109/TIE.2019.2902786
Incorporating social objectives in evaluating sustainable fisheries harvest strategy
Wu, Jiafeng, Wang, Na, Hu, Zhi-Hua, Hong, Zhenjie and Wang, You-Gan. (2019). Incorporating social objectives in evaluating sustainable fisheries harvest strategy. Environmental Modeling and Assessment. 24(4), pp. 381-386. https://doi.org/10.1007/s10666-019-9651-9
Significance tests for analyzing gene expression data with small sample sizes
Ullah, Insha, Paul, Sudhir, Hong, Zhenjie and Wang, You-Gan. (2019). Significance tests for analyzing gene expression data with small sample sizes. Bioinformatics. 35(20), pp. 3996-4003. https://doi.org/10.1093/bioinformatics/btz189
Robust Estimation Using Modified Huber’s Functions With New Tails
Jiang, Yunlu, Wang, You-Gan, Fu, Liya and Wang, Xueqin. (2019). Robust Estimation Using Modified Huber’s Functions With New Tails. Technometrics. 61(1), pp. 111-122. https://doi.org/10.1080/00401706.2018.1470037
Dividend growth and equity premium predictability
Zhu, Min, Chen, Rui, Du, Ke and Wang, You-Gan. (2018). Dividend growth and equity premium predictability. International Review of Economics and Finance. 56, pp. 125-137. https://doi.org/10.1016/j.iref.2017.10.020
Robust Regression with Data-Dependent Regularization Parameters and Autoregressive Temporal Correlations
Wang, Na, Wang, You-Gan, Hu, Shuwen, Hu, Zhi-Hua, Xu, Jing, Tang, Hongwu and Jin, Guangqiu. (2018). Robust Regression with Data-Dependent Regularization Parameters and Autoregressive Temporal Correlations. Environmental Modeling and Assessment. 23(6), pp. 779-786. https://doi.org/10.1007/s10666-018-9605-7
Analysis of spatial data with a nested correlation structure
Adegboye, Oyelola, Leung, Denis and Wang, You-Gan. (2018). Analysis of spatial data with a nested correlation structure. Journal of the Royal Statistical Society Series C: Applied Statistics. 67(2), pp. 329-354. https://doi.org/10.1111/rssc.12230
Working correlation structure selection in generalized estimating equations
Fu, Liya, Hao, Yangyang and Wang, You-Gan. (2018). Working correlation structure selection in generalized estimating equations. Computational Statistics. 33(2), pp. 983-996. https://doi.org/10.1007/s00180-018-0800-4
Selection of working correlation structure in generalized estimating equations
Wang, You-Gan and Fu, Liya. (2017). Selection of working correlation structure in generalized estimating equations. Statistics in Medicine. 36(14), pp. 2206-2219. https://doi.org/10.1002/sim.7262
Blockwise AICc for model selection in generalized linear models
Song, Guofeng, Dong, Xiaogang, Wu, Jiafeng and Wang, You-Gan. (2017). Blockwise AICc for model selection in generalized linear models. Environmental Modeling and Assessment. 22(6), pp. 523-533. https://doi.org/10.1007/s10666-017-9552-8
A comment on Koh’s “The optimal design of fallible organizations : Invariance of optimal decision threshold and uniqueness of hierarchy and polyarchy structures”
Zhu, Min, Liu, Chang and Wang, You-Gan. (2017). A comment on Koh’s “The optimal design of fallible organizations : Invariance of optimal decision threshold and uniqueness of hierarchy and polyarchy structures”. Social Choice and Welfare. 48(2), pp. 385-392. https://doi.org/10.1007/s00355-016-1009-5
Movement and growth of the coral reef holothuroids Bohadschia argus and Thelenota ananas
Purcell, Steven W., Piddocke, Toby P., Dalton, Steven J. and Wang, You-Gan. (2016). Movement and growth of the coral reef holothuroids Bohadschia argus and Thelenota ananas. Marine Ecology Progress Series. 551, pp. 201-214. https://doi.org/10.3354/meps11720
Improved confidence intervals for the linkage disequilibrium method for estimating effective population size
Jones, A. T., Ovenden, J. R. and Wang, Y.-G.. (2016). Improved confidence intervals for the linkage disequilibrium method for estimating effective population size. Heredity. 117(4), pp. 217-223. https://doi.org/10.1038/hdy.2016.19