Articles

ROBUST DEPENDENCE MEASURE FOR DETECTING ASSOCIATIONS IN LARGE DATA SET

  • Hangjin JIANG ,
  • Qiongli WU
Expand
  • 1. University of Chinese Academy of Sciences, Beijing 100049, China;
    2. Key Laboratory of Magnetic Resonance in Biological Systems, Wuhan Institute of Physics and Mathematics, Chinese Academy of Sciences, Wuhan 430071, China

Received date: 2017-01-26

  Online published: 2018-02-25

Supported by

Supported by the National Natural Science Foundation of China (31600290).

Abstract

In this paper, we proposed a new statistical dependency measure for two random vectors based on copula, called copula dependency coefficient (CDC). The CDC is proved to be robust to outliers and easy to be implemented. Especially, it is powerful and applicable to high-dimensional problems. All these properties make CDC practically important in related applications. Both experimental and application results show that CDC is a good robust dependence measure for association detecting.

Cite this article

Hangjin JIANG , Qiongli WU . ROBUST DEPENDENCE MEASURE FOR DETECTING ASSOCIATIONS IN LARGE DATA SET[J]. Acta mathematica scientia, Series B, 2018 , 38(1) : 57 -72 . DOI: 10.1016/S0252-9602(17)30117-0

References

[1] Breiman L, Friedman J H. Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc, 1985, 80(391):580-598
[2] Dembo A, Kagan A, Shepp L A. Remarks on the maximum correlation coefficient. Bernoulli, 2001, 7(2):343-350
[3] Deng H, Wickham H. Density Estimation in R. Electronic Publication, 2011
[4] Fan J Q, Lv J C. Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc B (Statistical Methodology), 2008, 70(5):849-911
[5] Ghahramani Z, Póczos B, Schneider J G. Copula-based kernel dependency measures. Proceedings of the 29th International Conference on Machine Learning (ICML-12), 2012:775-782
[6] Gretton A, Bousquet O, Smola A, Schölkopf B. Measuring statistical dependence with Hilbert-Schmidt norms. International Conference on Algorithmic Learning Theory, 2005:63-77
[7] Heller R, Heller Y, Gorfine M. A consistent multivariate test of association based on ranks of distances. Biometrika, 2013, 100(2):503-510
[8] Hoeffding W. A non-parametric test of independence. Ann Math Stat, 1948, 19(4):546-557
[9] Huang Q M, Zhu Y. Model-free sure screening via maximum correlation. J Multivariate Analysis, 2016, 148:89-106
[10] Jiang H J, Ding Y M. Equitability of Dependence Measure. arxiv stat, 2015
[11] Jiang H J, Shan Y, Wu Q L. Dependence measure:a comparative study. Acta Math Sci, 2017, 37A(5):931-949
[12] Lopez-Paz D, Hennig P, Schölkopf B. The randomized dependence coefficient. Advances in Neural Information Processing Systems, 2013:1-9
[13] Nelsen R B. An Introduction to Copulas. New York:Springer, 1999
[14] Papadatos N, Xifara T. A simple method for obtaining the maximal correlation coefficient and related characterizations. J Multivariate Anal, 2013, 118:102-114
[15] Rényi A. On measures of dependence. Acta Math Hung, 1959, 103:441-451
[16] Reshef D N, Reshef Y A, Finucane H K, et al. Detecting novel associations in large data sets. Science, 2011, 334(6062):1518-1524
[17] Schweizer B, Wolff E F. On nonparametric measures of dependence for random variables. Ann Stat, 1981, 9(4):879-885
[18] Silverman B W. Density Estimation for Statistics and Data Analysis. Boca Raton:CRC press, 1986
[19] Simon N, Tibshirani R. Comment on "detecting novel associations in large data sets" by Reshef et al, Science Dec 16, 2011. Science, 2014
[20] Székely G J, Rizzo M L. Brownian distance covariance. Ann Appl Stat, 2009, 3(4):1236-1265
[21] Tan Q H, Jiang H J, Ding Y M. Model selection method based on maximal information coefficient of residuals. Acta Math Sci, 2014, 34(2):579-592
Outlines

/