Principal component analysis (PCA) is used in diverse settings for dimensionality reduction. If data elements are all the same size, there are many approaches to estimating the PCA decomposition of the dataset. However, many datasets contain elements of different sizes that must be coerced into a fixed size before analysis. Such approaches introduce errors into the resulting PCA decomposition. We introduce CO-MPCA, a nonlinear method of directly estimating the PCA decomposition from datasets with elements of different sizes. We compare our method with two baseline approaches on three datasets: a synthetic vector dataset, a synthetic image dataset, and a real dataset of color histograms extracted from surveillance video. We provide quantitative and qualitative evidence that using CO-MPCA gives a more accurate estimate of the PCA basis.
Zhai, M., Shi, F., Duncan, D., & Jacobs, N. (2014). Covariance-Based PCA for Multi-Size Data. In International Conference on Pattern Recognition (ICPR). bibtex