A robust and efficient variable selection method for linear regression

Journal article


Yang, Zhuoran, Fu, Liya, Wang, You-Gan, Dong, Zhixiong and Jiang, Yunlu. (2021). A robust and efficient variable selection method for linear regression. Journal of Applied Statistics. 49(14), pp. 3677-3692. https://doi.org/10.1080/02664763.2021.1962259
AuthorsYang, Zhuoran, Fu, Liya, Wang, You-Gan, Dong, Zhixiong and Jiang, Yunlu
Abstract

Variable selection is fundamental to high dimensional statistical modeling, and many approaches have been proposed. However, existing variable selection methods do not perform well in presence of outliers in response variable or/and covariates. In order to ensure a high probability of correct selection and efficient parameter estimation, we investigate a robust variable selection method based on a modified Huber's function with an exponential squared loss tail. We also prove that the proposed method has oracle properties. Furthermore, we carry out simulation studies to evaluate the performance of the proposed method for both p<n and p>n. Our simulation results indicate that the proposed method is efficient and robust against outliers and heavy-tailed distributions. Finally, a real dataset from an air pollution mortality study is used to illustrate the proposed method.

KeywordsOracle properties; penalty function; robustness; variable selection
Year01 Jan 2021
JournalJournal of Applied Statistics
Journal citation49 (14), pp. 3677-3692
PublisherTaylor and Francis Ltd.
ISSN0266-4763
Digital Object Identifier (DOI)https://doi.org/10.1080/02664763.2021.1962259
PubMed ID36246863
PubMed Central IDPMC9559330
Web address (URL)https://www.tandfonline.com/doi/full/10.1080/02664763.2021.1962259
Open accessPublished as green open access
Research or scholarlyResearch
Page range3677-3692
Publisher's version
License
All rights reserved
File Access Level
Controlled
Output statusPublished
Publication dates
Online06 Aug 2021
Publication process dates
Accepted26 Jul 2021
Deposited13 Jan 2023
Supplemental file
License
All rights reserved
File Access Level
Controlled
ARC Funded ResearchThis output has been funded, wholly or partially, under the Australian Research Council Act 2001
Grant IDDP160104292
Additional information

© 2021 Informa UK Limited, trading as Taylor & Francis Group.

This research was supported by the National Natural Science Foundation of China (No. 11871390), Australian Research Council Discovery Project (DP160104292), the Fundamental Research Funds for the Central Universities (No. xjj2017180), the Natural Science Basic Research Plan in ShaanxiProvince of China (No. 2018JQ1006) and the Natural Science Foundation of Guangdong (Nos. 2018A030313171, 2019A1515011830).

Place of publicationUnited Kingdom
Permalink -

https://acuresearchbank.acu.edu.au/item/8y93x/a-robust-and-efficient-variable-selection-method-for-linear-regression

Restricted files

Publisher's version


Supplemental file

  • 3
    total views
  • 0
    total downloads
  • 0
    views this month
  • 0
    downloads this month
These values are for the period from 19th October 2020, when this repository was created.

Export as

Related outputs

Energy-efficient virtual machine placement in data centres via an accelerated Genetic Algorithm with improved fitness computation
Hormozi, Elham, Hu, Shuwen, Ding, Zhe, Tian, Yu-Chu, Wang, You-Gan, Yu, Zu-Guo and Zhang, Weizhe. (2022). Energy-efficient virtual machine placement in data centres via an accelerated Genetic Algorithm with improved fitness computation. Energy. 252, pp. 1-15. https://doi.org/10.1016/j.energy.2022.123884
A physics-informed statistical learning framework for forecasting local suspended sediment concentrations in marine environment
Zhang, Shaotong, Wu, Ryan, Wang, You-Gan, Jeng, Dong-Sheng and Li, Guangxue. (2022). A physics-informed statistical learning framework for forecasting local suspended sediment concentrations in marine environment. Water Research. 218, pp. 1-16. https://doi.org/10.1016/j.watres.2022.118518
Robustified extreme learning machine regression with applications in outlier-blended wind-speed forecasting
Yang, Yang, Zhou, Hu, Wu, Ryan, Ding, Zhe and Wang, You-Gan. (2022). Robustified extreme learning machine regression with applications in outlier-blended wind-speed forecasting. Applied Soft Computing. 122, pp. 1-14. https://doi.org/10.1016/j.asoc.2022.108814
An opposition learning and spiral modelling based arithmetic optimization algorithm for global continuous optimization problems
Yang, Yang, Gao, Yuchao, Tan, Shuang, Zhao, Shangrui, Wu, Jinran, Gao, Shangce, Zhang, Tengfei, Tian, Yu-Chu and Wang, You-Gan. (2022). An opposition learning and spiral modelling based arithmetic optimization algorithm for global continuous optimization problems. Engineering Applications of Artificial Intelligence. 113, p. Article 104981. https://doi.org/10.1016/j.engappai.2022.104981
A modified memetic algorithm with an application to gene selection in a sheep body weight study
Miao, Maoxuan, Wu, Jinran, Cai, Fengjing and Wang, You-Gan. (2022). A modified memetic algorithm with an application to gene selection in a sheep body weight study. Animals. 12(2), p. Article 201. https://doi.org/10.3390/ani12020201
Packing computing servers into the vessel of an underwater data center considering cooling efficiency
Hu, Zhi-Hua, Zheng, Yu-Xin and Wang, You-Gan. (2022). Packing computing servers into the vessel of an underwater data center considering cooling efficiency. Applied Energy. 314, p. Article 118986. https://doi.org/10.1016/j.apenergy.2022.118986
Robust regression with asymmetric loss functions
Fu, Liya and Wang, You-Gan. (2021). Robust regression with asymmetric loss functions. Statistical Methods in Medical Research. 30(8), pp. 1800-1815. https://doi.org/10.1177/09622802211012012
A temporal LASSO regression model for the emergency forecasting of the suspended sediment concentrations in coastal oceans: Accuracy and interpretability
Zhang, Shaotong, Wu, Ryan, Jia, Yonggang, Wang, You-Gan, Zhang, Yaqi and Duan, Qibin. (2021). A temporal LASSO regression model for the emergency forecasting of the suspended sediment concentrations in coastal oceans: Accuracy and interpretability. Engineering Applications of Artificial Intelligence. 100, pp. 1-13. https://doi.org/10.1016/j.engappai.2021.104206
Robust approach for variable selection with high dimensional longitudinal data analysis
Fu, Liya, Li, Jiaqi and Wang, You-Gan. (2021). Robust approach for variable selection with high dimensional longitudinal data analysis. Statistics in Medicine. 40(30), pp. 6835-6854. https://doi.org/10.1002/sim.9213
Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis
Fu, Liya, Yang, Zhuoran, Cai, Fengjing and Wang, You-Gan. (2021). Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis. Computational Statistics. 36(2), pp. 781-804. https://doi.org/10.1007/s00180-020-01038-3
Predictive regression with p-lags and order-q autoregressive predictors
Jayetileke, Harshanie L., Wang, You-Gan and Zhu, Min. (2021). Predictive regression with p-lags and order-q autoregressive predictors. Journal of Empirical Finance. 62, pp. 282-293. https://doi.org/10.1016/j.jempfin.2021.04.006
An efficient Gehan-type estimation for the accelerated failure time model with clustered and censored data
Fu, Liya, Yang, Zhuoran, Zhou, Yan and Wang, You-Gan. (2021). An efficient Gehan-type estimation for the accelerated failure time model with clustered and censored data. Lifetime Data Analysis. 27(4), pp. 679-709. https://doi.org/10.1007/s10985-021-09526-4
Robust estimation procedure for autoregressive models with heterogeneity
Callens, A., Wang, Y.-G., Fu, L. and Liquet, B.. (2021). Robust estimation procedure for autoregressive models with heterogeneity. Environmental Modeling and Assessment. 26(3), pp. 313-323. https://doi.org/10.1007/s10666-020-09730-w
Influential factors on Chinese airlines’ profitability and forecasting methods
Xu, Xu, McGrory, Clare Anne, Wang, You-Gan and Wu, Jinran. (2021). Influential factors on Chinese airlines’ profitability and forecasting methods. Journal of Air Transport Management. 91, p. Article 101969. https://doi.org/10.1016/j.jairtraman.2020.101969
Support vector regression with asymmetric loss for optimal electric load forecasting
Wu, Ryan, Wang, You-Gan, Tian, Yu-Chu, Burrage, Kevin and Cao, Taoyun. (2021). Support vector regression with asymmetric loss for optimal electric load forecasting. Energy. 223, p. Article 119969. https://doi.org/10.1016/j.energy.2021.119969
Exact algorithms for energy-efficient virtual machine placement in data centers
Wei, Chen, Hu, Zhi-Hua and Wang, You-Gan. (2020). Exact algorithms for energy-efficient virtual machine placement in data centers. Future Generation Computer Systems. 106, pp. 77-91. https://doi.org/10.1016/j.future.2019.12.043
A working likelihood approach for robust regression
Fu, Liya, Wang, You-Gan and Cai, Fengjing. (2020). A working likelihood approach for robust regression. Statistical Methods in Medical Research. 29(12), pp. 3641-3652. https://doi.org/10.1177/0962280220936310
Maritime convection and fluctuation between Vietnam and China : A data-driven study
Hu, Zhi-Hua, Liu, Chan-Juan, Chen, Wanting, Wang, You-Gan and Wei, Chen. (2020). Maritime convection and fluctuation between Vietnam and China : A data-driven study. Research in Transportation Business and Management. 34, pp. 1-15. https://doi.org/10.1016/j.rtbm.2019.100414
Identifying barley pan-genome sequence anchors using genetic mapping and machine learning
Gao, Shang, Wu, Ryan, Stiller, Jiri, Zheng, Zhi, Zhou, Meixue, Wang, You-Gan and Liu, Chunji. (2020). Identifying barley pan-genome sequence anchors using genetic mapping and machine learning. Theoretical and Applied Genetics. 133(9), pp. 2535-2544. https://doi.org/10.1007/s00122-020-03615-y
Natural mortality estimation using tree-based ensemble learning models
Liu, Chanjuan, Zhou, Shijie, Wang, You-Gan and Hu, Zhi-Hua. (2020). Natural mortality estimation using tree-based ensemble learning models. ICES Journal of Marine Science. 77(4), pp. 1414-1426. https://doi.org/10.1093/icesjms/fsaa058
Profile-guided three-phase virtual resource management for energy efficiency of data centers
Ding, Zhe, Tian, Yu-Chu, Tang, Maolin, Li, Yuefeng, Wang, You-Gan and Zhou, Chunjie. (2020). Profile-guided three-phase virtual resource management for energy efficiency of data centers. IEEE Transactions on Industrial Electronics. 67(3), pp. 2460-2468. https://doi.org/10.1109/TIE.2019.2902786
Incorporating social objectives in evaluating sustainable fisheries harvest strategy
Wu, Jiafeng, Wang, Na, Hu, Zhi-Hua, Hong, Zhenjie and Wang, You-Gan. (2019). Incorporating social objectives in evaluating sustainable fisheries harvest strategy. Environmental Modeling and Assessment. 24(4), pp. 381-386. https://doi.org/10.1007/s10666-019-9651-9
Significance tests for analyzing gene expression data with small sample sizes
Ullah, Insha, Paul, Sudhir, Hong, Zhenjie and Wang, You-Gan. (2019). Significance tests for analyzing gene expression data with small sample sizes. Bioinformatics. 35(20), pp. 3996-4003. https://doi.org/10.1093/bioinformatics/btz189
Robust Estimation Using Modified Huber’s Functions With New Tails
Jiang, Yunlu, Wang, You-Gan, Fu, Liya and Wang, Xueqin. (2019). Robust Estimation Using Modified Huber’s Functions With New Tails. Technometrics. 61(1), pp. 111-122. https://doi.org/10.1080/00401706.2018.1470037
Dividend growth and equity premium predictability
Zhu, Min, Chen, Rui, Du, Ke and Wang, You-Gan. (2018). Dividend growth and equity premium predictability. International Review of Economics and Finance. 56, pp. 125-137. https://doi.org/10.1016/j.iref.2017.10.020
Robust Regression with Data-Dependent Regularization Parameters and Autoregressive Temporal Correlations
Wang, Na, Wang, You-Gan, Hu, Shuwen, Hu, Zhi-Hua, Xu, Jing, Tang, Hongwu and Jin, Guangqiu. (2018). Robust Regression with Data-Dependent Regularization Parameters and Autoregressive Temporal Correlations. Environmental Modeling and Assessment. 23(6), pp. 779-786. https://doi.org/10.1007/s10666-018-9605-7
Analysis of spatial data with a nested correlation structure
Adegboye, Oyelola, Leung, Denis and Wang, You-Gan. (2018). Analysis of spatial data with a nested correlation structure. Journal of the Royal Statistical Society Series C: Applied Statistics. 67(2), pp. 329-354. https://doi.org/10.1111/rssc.12230
Working correlation structure selection in generalized estimating equations
Fu, Liya, Hao, Yangyang and Wang, You-Gan. (2018). Working correlation structure selection in generalized estimating equations. Computational Statistics. 33(2), pp. 983-996. https://doi.org/10.1007/s00180-018-0800-4
Selection of working correlation structure in generalized estimating equations
Wang, You-Gan and Fu, Liya. (2017). Selection of working correlation structure in generalized estimating equations. Statistics in Medicine. 36(14), pp. 2206-2219. https://doi.org/10.1002/sim.7262
Blockwise AICc for model selection in generalized linear models
Song, Guofeng, Dong, Xiaogang, Wu, Jiafeng and Wang, You-Gan. (2017). Blockwise AICc for model selection in generalized linear models. Environmental Modeling and Assessment. 22(6), pp. 523-533. https://doi.org/10.1007/s10666-017-9552-8
A comment on Koh’s “The optimal design of fallible organizations : Invariance of optimal decision threshold and uniqueness of hierarchy and polyarchy structures”
Zhu, Min, Liu, Chang and Wang, You-Gan. (2017). A comment on Koh’s “The optimal design of fallible organizations : Invariance of optimal decision threshold and uniqueness of hierarchy and polyarchy structures”. Social Choice and Welfare. 48(2), pp. 385-392. https://doi.org/10.1007/s00355-016-1009-5
Movement and growth of the coral reef holothuroids Bohadschia argus and Thelenota ananas
Purcell, Steven W., Piddocke, Toby P., Dalton, Steven J. and Wang, You-Gan. (2016). Movement and growth of the coral reef holothuroids Bohadschia argus and Thelenota ananas. Marine Ecology Progress Series. 551, pp. 201-214. https://doi.org/10.3354/meps11720
Improved confidence intervals for the linkage disequilibrium method for estimating effective population size
Jones, A. T., Ovenden, J. R. and Wang, Y.-G.. (2016). Improved confidence intervals for the linkage disequilibrium method for estimating effective population size. Heredity. 117(4), pp. 217-223. https://doi.org/10.1038/hdy.2016.19