Subspace recovery from noisy or even corrupted data is critical for various applications in machine learning and data analysis. To detect outliers, Robust PCA (R-PCA) via Outlier Pursuit was proposed and had found many successful applications. However, the current theoretical analysis on Outlier Pursuit only shows that it succeeds when the sparsity of the corruption matrix is of O(n/r), where n is the number of the samples and r is the rank of the intrinsic matrix which may be comparable to n. Moreover, the regularization parameter is suggested as 3/(7\sqrt \gamma n), where γ is a parameter that is not known a priori. In this paper, with incoherence condition and proposed ambiguity condition we prove that Outlier Pursuit succeeds when the rank of the intrinsic matrix is of O(n/ log n) and the sparsity of the corruption matrix is of O(n). We further show that the orders of both bounds are tight. Thus R-PCA via Outlier Pursuit is able to recover intrinsic matrix of higher rank and identify much denser corruptions than what the existing results could predict. Moreover, we suggest that the regularization parameter be chosen as 1/\sqrt log n, which is definite. Our analysis waives the necessity of tuning the regularization parameter and also significantly extends the working range of the Outlier Pursuit. Experiments on synthetic and real data verify our theories.

Type

Publication

AAAI Conference on Artificial Intelligence