Modified Multidimensional Scaling
Classical multidimensional scaling is an important tool for data reduction in many applications. It takes in a distance matrix and outputs low-dimensional embedded samples such that the pairwise distances between the original data points can be preserved, when treating them as deterministic points. However, data are often noisy in practice. In such case, the quality of embedded samples produced by classical multidimensional scaling starts to break down, when either the ambient dimensionality or the noise variance gets larger. This motivates us to propose the modified multidimensional scaling procedure which applies a nonlinear shrinkage to the sample eigenvalues. The nonlinear transformation is determined by sample size, the ambient dimensionality, and moment of noise. As an application, we consider the problem of clustering high-dimensional noisy data. We show that modified multidimensional scaling followed by various clustering algorithms can achieve exact recovery, i.e., all the cluster labels can be recovered correctly with probability tending to one. Numerical simulations and two real data applications lend strong support to our proposed methodology.