WEIGHTED LASSO ESTIMATES FOR SPARSE LOGISTIC REGRESSION:NON-ASYMPTOTIC PROPERTIES WITH MEASUREMENT ERRORS

Huamei HUANG; Yujing GAO; Huiming ZHANG; Bo LI

doi:10.1007/s10473-021-0112-6

Acta mathematica scientia, Series B >

2021 , Vol. 41 >Issue 1: 207 - 230

DOI: https://doi.org/10.1007/s10473-021-0112-6

Articles

WEIGHTED LASSO ESTIMATES FOR SPARSE LOGISTIC REGRESSION:NON-ASYMPTOTIC PROPERTIES WITH MEASUREMENT ERRORS

Huamei HUANG ,
Yujing GAO ,
Huiming ZHANG ,
Bo LI

Expand

1. Department of Statistics and Finance, University of Science and Technology of China, Hefei 230026, China;
2. Guanghua School of Management, Peking University, Beijing 100871, China;
3. School of Mathematical Sciences, Peking University, Beijing 100871, China;
4. School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China

Huamei HUANG,E-mail:huanghm@mail.ustc.edu.cn;Yujing GAO,E-mail:jane.g1996@pku.edu.cn;Huiming ZHANG,E-mail:zhanghuiming@pku.edu.cn

Received date: 2019-11-06

Revised date: 2020-09-17

Online published: 2021-04-06

Supported by

Three authors, Huamei Huang, Yujing Gao and Huiming Zhang, are co-first authors contributed equally to this work. Supported by the National Natural Science Foundation of China (61877023) and the Fundamental Research Funds for the Central Universities (CCNU19TD009).

Fold

Abstract

For high-dimensional models with a focus on classification performance, the $\ell_{1}$-penalized logistic regression is becoming important and popular. However, the Lasso estimates could be problematic when penalties of different coefficients are all the same and not related to the data. We propose two types of weighted Lasso estimates, depending upon covariates determined by the McDiarmid inequality. Given sample size $n$ and a dimension of covariates $p$, the finite sample behavior of our proposed method with a diverging number of predictors is illustrated by non-asymptotic oracle inequalities such as the $\ell_{1}$-estimation error and the squared prediction error of the unknown parameters. We compare the performance of our method with that of former weighted estimates on simulated data, then apply it to do real data analysis.

Key words： logistic regression; weighted Lasso; oracle inequalities; high-dimensional statistics; measurement error

Cite this article

Huamei HUANG , Yujing GAO , Huiming ZHANG , Bo LI . WEIGHTED LASSO ESTIMATES FOR SPARSE LOGISTIC REGRESSION:NON-ASYMPTOTIC PROPERTIES WITH MEASUREMENT ERRORS[J]. Acta mathematica scientia, Series B, 2021 , 41(1) : 207 -230 . DOI: 10.1007/s10473-021-0112-6

References

[1] Algamal Z Y, Lee M H. A new adaptive L1-norm for optimal descriptor selection of high-dimensional QSAR classification model for anti-hepatitis C virus activity of thiourea derivatives. SAR and QSAR in Environmental Research, 2017, 28(1):75-90
[2] Bickel P J, Ritov Y, Tsybakov A B. Simultaneous analysis of Lasso and Dantzig selector. The Annals of Statistics, 2009, 37(4):1705-1732
[3] Buhlmann P, Van De Geer S. Statistics for High-Dimensional Data:Methods, Theory and Applications. Springer Science & Business Media, 2011
[4] Boucheron S, Lugosi G, Massart P. Concentration Inequalities:A Nonasymptotic Theory of Independence. Oxford University Press, 2013
[5] Bunea F. Honest variable selection in linear and logistic regression models via l(1) and l(1) + l(2) penalization. Electronic Journal of Statistics, 2008, 2:1153-1194
[6] Cox D R. The regression analysis of binary sequences (with discussion). Journal of the Royal Statistical Society:Series B (Methodological), 1958, 20(2):215-232
[7] Dudoit S, Fridlyand J, Speed T P. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association, 2002, 97(457):77-87
[8] Efron B, Hastie T. Computer Age Statistical Inference. Cambridge University Press, 2016
[9] Fan Y, Zhang H, Yan T. Asymptotic theory for differentially private generalized β-models with parameters increasing. Statistics and Its Interface, 2020, 13(3):385-398
[10] Golub T R, Slonim D K, Tamayo P, et al. Molecular classification of cancer:class discovery and class prediction by gene expression monitoring. Science, 1999, 286(5439):531-537
[11] Guo P, Zeng F, Hu X, et al. Improved variable selection algorithm using a LASSO-type penalty, with an application to assessing hepatitis B infection relevant factors in community residents. PloS One, 2015, 10(7)
[12] Hastie T, Tibshirani R, Wainwright M. Statistical Learning with Sparsity:the Lasso and Generalizations. CRC Press, 2015
[13] Li W, Lederer J. Tuning parameter calibration for l(1)-regularized logistic regression. Journal of Statistical Planning and Inference, 2019, 202:80-98
[14] Liu C, San Wong H. Structured penalized logistic regression for gene selection in gene expression data analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, 16(1):312-321
[15] Kwemou M. Non-asymptotic oracle inequalities for the Lasso and group Lasso in high dimensional logistic model. ESAIM:Probability and Statistics, 2016, 20:309-331
[16] Ma R, Cai T, Li H. Global and simultaneous hypothesis testing for high-dimensional logistic regression models. Journal of the American Statistical Association, 2020:1-15
[17] Park H, Konishi S. Robust logistic regression modelling via the elastic net-type regularization and tuning parameter selection. Journal of Statistical Computation and Simulation, 2016, 86(7):1450-1461
[18] Rigollet P, Hütter J C. High Dimensional Statistics. MIT Open CourseWare. 2019. http://www-math.mit.edu/rigollet/PDFs/RigNotes17.pdf
[19] Sur P, Chen Y, Candes E J. The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled chi-square. Probability Theory and Related Fields, 2019, 175(1/2):487-558
[20] Tutz G. Regression for Categorical Data. Cambridge University Press, 2011
[21] Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society:Series B (Methodological), 1996, 58(1):267-288
[22] van de Geer, S. A. High-dimensional generalized linear models and the lasso. The Annals of Statistics, 2008, 36(2):614-645
[23] Yang X, Zhang H, Wei H, et al. Sparse density estimation with measurement errors. arXiv:1911.06215, 2019
[24] Yin Z. Variable selection for sparse logistic regression. Metrika, 2020, 83(7):821-836
[25] Zou H. The adaptive lasso and its oracle properties. Journal of the American statistical association, 2006, 101(476):1418-1429
[26] Zhang H, Jia J. Elastic-net regularized high-dimensional negative binomial regression:consistency and weak signals detection. Statistica Sinica, 2021
[27] Zhang H. A note on//MLE in logistic regression with a diverging dimension. arXiv:1801.08898, 2018
[28] Luo J, Qin H, Wang Z. Asymptotic distribution in directed finite weighted random graphs with an increasing Bi-degree sequence. Acta Math Sci, 2020, 40B(2):355-368

Options

Abstract

Outlines

模态框（Modal）标题

Abstract

Cite this article

References