Problem 2043. Six Steps to PCA - Step 1: Centre and Standardize
Introduction
Principal Component Analysis (PCA) is a classic among the many methods of multivariate data analysis. Invented in 1901 by Karl Pearson the method is mostly used today as a tool in exploratory data analysis and dimension reduction, but also for making predictive models in machine learning.
Step 1: Centre and Standardize
A first step for many multivariate methods begins by removing the influence of location and scale from variables in the raw data. Also commonly known as the z-scores of X, Z is a transformation of X such that the columns are centered to have mean 0 and scaled to have standard deviation 1 (unless a column of X is constant, in which case that column of Z is constant at 0). Strictly speaking, z-scores are based on population parameters, whereas the analogous calculation based on sample mean and standard deviation is the Student's t-statistic.
Task
Write a function to centre and standardize the input matrix X, returning as the output a structure with the following fields:
- Z: the centred and standardized matrix corresponding to the input X
- Mu: a vector of the original means of columns of X
- Sigma: a vector of the original standard deviations of columns of X
Tips
- Matlab's zscore function is part of the Stats Toolbox which is not available in Cody. You'll have to write your own.
- You should take care to avoid division by zero when a column is invariant.
Following problems in the series
Solution Stats
Problem Comments
-
1 Comment
Your definition of a constant (or invariant) data with rand is problematic. If you increase the size of your data (n=1000, n=10000...), you can always increase the deviations (so what threshold for sigma ?). I think that with real data, this artifact isn't possible. No ?
Solution Comments
Show commentsProblem Recent Solvers18
Suggested Problems
-
648 Solvers
-
Find the "ordinary" or Euclidean distance between A and Z
161 Solvers
-
Rosenbrock's Banana Function and its derivatives
154 Solvers
-
410 Solvers
-
5233 Solvers
More from this Author1
Problem Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!