Advisor: Mathias Drton


Two hypothesis testing problems related to high-dimensional covariance/correlation structures will be presented.

1. Non-parametric test of independence in high dimensions. We treat the problem of testing independence between m continuous observations when m can be larger than the available sample size n. We consider three types of test statistics that are constructed as sums of many pairwise rank correlation signals. In the asymptotic regime where both m and n converge to infinity, a martingale central limit theorem is applied to show that the null distributions of these statistics converge to Gaussian limits, which are valid with no specific distributional or moment assumptions on the data. Using the framework of U-statistics, our result covers a variety of rank correlations including Kendall's tau and a dominating term of Spearman's rank correlation coefficient (rho), but also degenerate U-statistics such as Hoeffding's D or the of Bergsma and Dassios (2014). Like the classical theory for U-statistics, the test statistics need to be scaled differently when the rank correlations used to construct them are degenerate U-statistics. The power of the considered tests is explored in rate-optimality theory under a Gaussian equicorrelation alternative as well as in numerical experiments for specific cases of more general alternatives.

2. Testing high-dimensional hidden factor models. Let be independent and identically distributed observations of a -dimensional random vector that follows a multivariate Gaussian distribution with covariance matrix . In the high-dimensional setup where can be much larger than , we consider the problem of testing the null hypothesis that comes from a -latent factor model, i.e., there exists an unobserved -dimensional random vector such that are jointly Gaussian, and the components of are conditionally independent given . By exploiting the algebraic fact that a certain class of polynomial functions in entries of , known as model invariants, would vanish when the null is true, our test is based on a statistic that is constructed as the maximum of the absolute values of all such model invariants in entries of the sample covariance matrix computed from the data . This general strategy of constructing test statistics can be extended to test for the validity of linear latent factor models that are not necessarily Gaussian, and also latent factor structures underlying a Gaussian copula model, in which case the model invariants in the Kendall's tau correlation matrix will be considered. The critical value of our test statistic can be calibrated based on a wild bootstrap technique.