Documentación

Esta página aún no se ha traducido para esta versión. Puede ver la versión más reciente de esta página en inglés.

# pca

Principal component analysis of raw data

## Sintaxis

``coeff = pca(X)``
``coeff = pca(X,Name,Value)``
``````[coeff,score,latent] = pca(___)``````
``````[coeff,score,latent,tsquared] = pca(___)``````
``````[coeff,score,latent,tsquared,explained,mu] = pca(___)``````

## Descripción

ejemplo

````coeff = pca(X)` returns the principal component coefficients, also known as loadings, for the n-by-p data matrix `X`. Rows of `X` correspond to observations and columns correspond to variables. The coefficient matrix is p-by-p. Each column of `coeff` contains coefficients for one principal component, and the columns are in descending order of component variance. By default, `pca` centers the data and uses the singular value decomposition (SVD) algorithm. ```

ejemplo

````coeff = pca(X,Name,Value)` returns any of the output arguments in the previous syntaxes using additional options for computation and handling of special data types, specified by one or more `Name,Value` pair arguments.For example, you can specify the number of principal components `pca` returns or an algorithm other than SVD to use.```

ejemplo

``````[coeff,score,latent] = pca(___)``` also returns the principal component scores in `score` and the principal component variances in `latent`. You can use any of the input arguments in the previous syntaxes.Principal component scores are the representations of `X` in the principal component space. Rows of `score` correspond to observations, and columns correspond to components.The principal component variances are the eigenvalues of the covariance matrix of `X`.```

ejemplo

``````[coeff,score,latent,tsquared] = pca(___)``` also returns the Hotelling's T-squared statistic for each observation in `X`.```

ejemplo

``````[coeff,score,latent,tsquared,explained,mu] = pca(___)``` also returns `explained`, the percentage of the total variance explained by each principal component and `mu`, the estimated mean of each variable in `X`.```

## Ejemplos

contraer todo

Load the sample data set.

`load hald`

The ingredients data has 13 observations for 4 variables.

Find the principal components for the ingredients data.

` coeff = pca(ingredients)`
```coeff = 4×4 -0.0678 -0.6460 0.5673 0.5062 -0.6785 -0.0200 -0.5440 0.4933 0.0290 0.7553 0.4036 0.5156 0.7309 -0.1085 -0.4684 0.4844 ```

The rows of `coeff` contain the coefficients for the four ingredient variables, and its columns correspond to four principal components.

Find the principal component coefficients when there are missing values in a data set.

Load the sample data set.

`load imports-85`

Data matrix `X` has 13 continuous variables in columns 3 to 15: wheel-base, length, width, height, curb-weight, engine-size, bore, stroke, compression-ratio, horsepower, peak-rpm, city-mpg, and highway-mpg. The variables bore and stroke are missing four values in rows 56 to 59, and the variables horsepower and peak-rpm are missing two values in rows 131 and 132.

Perform principal component analysis.

```coeff = pca(X(:,3:15)); ```

By default, `pca` performs the action specified by the `'Rows','complete'` name-value pair argument. This option removes the observations with `NaN` values before calculation. Rows of `NaN`s are reinserted into `score` and `tsquared` at the corresponding locations, namely rows 56 to 59, 131, and 132.

Use `'pairwise'` to perform the principal component analysis.

```coeff = pca(X(:,3:15),'Rows','pairwise'); ```

In this case, `pca` computes the (i,j) element of the covariance matrix using the rows with no `NaN` values in the columns i or j of `X`. Note that the resulting covariance matrix might not be positive definite. This option applies when the algorithm `pca` uses is eigenvalue decomposition. When you don’t specify the algorithm, as in this example, `pca` sets it to `'eig'`. If you require `'svd'` as the algorithm, with the `'pairwise'` option, then `pca` returns a warning message, sets the algorithm to `'eig'` and continues.

If you use the `'Rows','all'` name-value pair argument, `pca` terminates because this option assumes there are no missing values in the data set.

```coeff = pca(X(:,3:15),'Rows','all'); ```
```Error using pca (line 180) Raw data contains NaN missing value while 'Rows' option is set to 'all'. Consider using 'complete' or pairwise' option instead.```

Use the inverse variable variances as weights while performing the principal components analysis.

Load the sample data set.

`load hald`

Perform the principal component analysis using the inverse of variances of the ingredients as variable weights.

``` [wcoeff,~,latent,~,explained] = pca(ingredients,... 'VariableWeights','variance')```
```wcoeff = 4×4 -2.7998 2.9940 -3.9736 1.4180 -8.7743 -6.4411 4.8927 9.9863 2.5240 -3.8749 -4.0845 1.7196 9.1714 7.5529 3.2710 11.3273 ```
```latent = 4×1 2.2357 1.5761 0.1866 0.0016 ```
```explained = 4×1 55.8926 39.4017 4.6652 0.0406 ```

Note that the coefficient matrix, `wcoeff`, is not orthonormal.

Calculate the orthonormal coefficient matrix.

`coefforth = inv(diag(std(ingredients)))* wcoeff`
```coefforth = 4×4 -0.4760 0.5090 -0.6755 0.2411 -0.5639 -0.4139 0.3144 0.6418 0.3941 -0.6050 -0.6377 0.2685 0.5479 0.4512 0.1954 0.6767 ```

Check orthonormality of the new coefficient matrix, `coefforth`.

` coefforth*coefforth'`
```ans = 4×4 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 -0.0000 -0.0000 0.0000 -0.0000 1.0000 0.0000 0.0000 -0.0000 0.0000 1.0000 ```

Find the principal components using the alternating least squares (ALS) algorithm when there are missing values in the data.

Load the sample data.

`load hald`

The ingredients data has 13 observations for 4 variables.

Perform principal component analysis using the ALS algorithm and display the component coefficients.

```[coeff,score,latent,tsquared,explained] = pca(ingredients); coeff```
```coeff = 4×4 -0.0678 -0.6460 0.5673 0.5062 -0.6785 -0.0200 -0.5440 0.4933 0.0290 0.7553 0.4036 0.5156 0.7309 -0.1085 -0.4684 0.4844 ```

Introduce missing values randomly.

```y = ingredients; rng('default'); % for reproducibility ix = random('unif',0,1,size(y))<0.30; y(ix) = NaN```
```y = 13×4 7 26 6 NaN 1 29 15 52 NaN NaN 8 20 11 31 NaN 47 7 52 6 33 NaN 55 NaN NaN NaN 71 NaN 6 1 31 NaN 44 2 NaN NaN 22 21 47 4 26 ⋮ ```

Approximately 30% of the data has missing values now, indicated by `NaN`.

Perform principal component analysis using the ALS algorithm and display the component coefficients.

```[coeff1,score1,latent,tsquared,explained,mu1] = pca(y,... 'algorithm','als'); coeff1```
```coeff1 = 4×4 -0.0362 0.8215 -0.5252 0.2190 -0.6831 -0.0998 0.1828 0.6999 0.0169 0.5575 0.8215 -0.1185 0.7292 -0.0657 0.1261 0.6694 ```

Display the estimated mean.

`mu1`
```mu1 = 1×4 8.9956 47.9088 9.0451 28.5515 ```

Reconstruct the observed data.

`t = score1*coeff1' + repmat(mu1,13,1)`
```t = 13×4 7.0000 26.0000 6.0000 51.5250 1.0000 29.0000 15.0000 52.0000 10.7819 53.0230 8.0000 20.0000 11.0000 31.0000 13.5500 47.0000 7.0000 52.0000 6.0000 33.0000 10.4818 55.0000 7.8328 17.9362 3.0982 71.0000 11.9491 6.0000 1.0000 31.0000 -0.5161 44.0000 2.0000 53.7914 5.7710 22.0000 21.0000 47.0000 4.0000 26.0000 ⋮ ```

The ALS algorithm estimates the missing values in the data.

Another way to compare the results is to find the angle between the two spaces spanned by the coefficient vectors. Find the angle between the coefficients found for complete data and data with missing values using ALS.

`subspace(coeff,coeff1)`
```ans = 8.2686e-16 ```

This is a small value. It indicates that the results if you use `pca` with `'Rows','complete'` name-value pair argument when there is no missing data and if you use `pca` with `'algorithm','als'` name-value pair argument when there is missing data are close to each other.

Perform the principal component analysis using `'Rows','complete'` name-value pair argument and display the component coefficients.

```[coeff2,score2,latent,tsquared,explained,mu2] = pca(y,... 'Rows','complete'); coeff2```
```coeff2 = 4×3 -0.2054 0.8587 0.0492 -0.6694 -0.3720 0.5510 0.1474 -0.3513 -0.5187 0.6986 -0.0298 0.6518 ```

In this case, `pca` removes the rows with missing values, and `y` has only four rows with no missing values. `pca` returns only three principal components. You cannot use the `'Rows','pairwise'` option because the covariance matrix is not positive semidefinite and `pca` returns an error message.

Find the angle between the coefficients found for complete data and data with missing values using listwise deletion (when `'Rows','complete'`).

`subspace(coeff(:,1:3),coeff2)`
```ans = 0.3576 ```

The angle between the two spaces is substantially larger. This indicates that these two results are different.

Display the estimated mean.

`mu2`
```mu2 = 1×4 7.8889 46.9091 9.8750 29.6000 ```

In this case, the mean is just the sample mean of `y`.

Reconstruct the observed data.

`score2*coeff2'`
```ans = 13×4 NaN NaN NaN NaN -7.5162 -18.3545 4.0968 22.0056 NaN NaN NaN NaN NaN NaN NaN NaN -0.5644 5.3213 -3.3432 3.6040 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 12.8315 -0.1076 -6.3333 -3.7758 ⋮ ```

This shows that deleting rows containing `NaN` values does not work as well as the ALS algorithm. Using ALS is better when the data has too many missing values.

Find the coefficients, scores, and variances of the principal components.

Load the sample data set.

`load hald`

The ingredients data has 13 observations for 4 variables.

Find the principal component coefficients, scores, and variances of the components for the ingredients data.

`[coeff,score,latent] = pca(ingredients)`
```coeff = 4×4 -0.0678 -0.6460 0.5673 0.5062 -0.6785 -0.0200 -0.5440 0.4933 0.0290 0.7553 0.4036 0.5156 0.7309 -0.1085 -0.4684 0.4844 ```
```score = 13×4 36.8218 -6.8709 -4.5909 0.3967 29.6073 4.6109 -2.2476 -0.3958 -12.9818 -4.2049 0.9022 -1.1261 23.7147 -6.6341 1.8547 -0.3786 -0.5532 -4.4617 -6.0874 0.1424 -10.8125 -3.6466 0.9130 -0.1350 -32.5882 8.9798 -1.6063 0.0818 22.6064 10.7259 3.2365 0.3243 -9.2626 8.9854 -0.0169 -0.5437 -3.2840 -14.1573 7.0465 0.3405 ⋮ ```
```latent = 4×1 517.7969 67.4964 12.4054 0.2372 ```

Each column of `score` corresponds to one principal component. The vector, `latent`, stores the variances of the four principal components.

Reconstruct the centered ingredients data.

`Xcentered = score*coeff'`
```Xcentered = 13×4 -0.4615 -22.1538 -5.7692 30.0000 -6.4615 -19.1538 3.2308 22.0000 3.5385 7.8462 -3.7692 -10.0000 3.5385 -17.1538 -3.7692 17.0000 -0.4615 3.8462 -5.7692 3.0000 3.5385 6.8462 -2.7692 -8.0000 -4.4615 22.8462 5.2308 -24.0000 -6.4615 -17.1538 10.2308 14.0000 -5.4615 5.8462 6.2308 -8.0000 13.5385 -1.1538 -7.7692 -4.0000 ⋮ ```

The new data in `Xcentered` is the original ingredients data centered by subtracting the column means from corresponding columns.

Visualize both the orthonormal principal component coefficients for each variable and the principal component scores for each observation in a single plot.

`biplot(coeff(:,1:2),'scores',score(:,1:2),'varlabels',{'v_1','v_2','v_3','v_4'});` All four variables are represented in this biplot by a vector, and the direction and length of the vector indicate how each variable contributes to the two principal components in the plot. For example, the first principal component, which is on the horizontal axis, has positive coefficients for the third and fourth variables. Therefore, vectors and are directed into the right half of the plot. The largest coefficient in the first principal component is the fourth, corresponding to the variable .

The second principal component, which is on the vertical axis, has negative coefficients for the variables , , and , and a positive coefficient for the variable .

This 2-D biplot also includes a point for each of the 13 observations, with coordinates indicating the score of each observation for the two principal components in the plot. For example, points near the left edge of the plot have the lowest scores for the first principal component. The points are scaled with respect to the maximum score value and maximum coefficient length, so only their relative locations can be determined from the plot.

Find the Hotelling’s T-squared statistic values.

Load the sample data set.

`load hald`

The ingredients data has 13 observations for 4 variables.

Perform the principal component analysis and request the T-squared values.

```[coeff,score,latent,tsquared] = pca(ingredients); tsquared```
```tsquared = 13×1 5.6803 3.0758 6.0002 2.6198 3.3681 0.5668 3.4818 3.9794 2.6086 7.4818 ⋮ ```

Request only the first two principal components and compute the T-squared values in the reduced space of requested principal components.

```[coeff,score,latent,tsquared] = pca(ingredients,'NumComponents',2); tsquared```
```tsquared = 13×1 5.6803 3.0758 6.0002 2.6198 3.3681 0.5668 3.4818 3.9794 2.6086 7.4818 ⋮ ```

Note that even when you specify a reduced component space, `pca` computes the T-squared values in the full space, using all four components.

The T-squared value in the reduced space corresponds to the Mahalanobis distance in the reduced space.

`tsqreduced = mahal(score,score)`
```tsqreduced = 13×1 3.3179 2.0079 0.5874 1.7382 0.2955 0.4228 3.2457 2.6914 1.3619 2.9903 ⋮ ```

Calculate the T-squared values in the discarded space by taking the difference of the T-squared values in the full space and Mahalanobis distance in the reduced space.

`tsqdiscarded = tsquared - tsqreduced`
```tsqdiscarded = 13×1 2.3624 1.0679 5.4128 0.8816 3.0726 0.1440 0.2362 1.2880 1.2467 4.4915 ⋮ ```

Find the percent variability explained by the principal components. Show the data representation in the principal components space.

Load the sample data set.

`load imports-85`

Data matrix `X` has 13 continuous variables in columns 3 to 15: wheel-base, length, width, height, curb-weight, engine-size, bore, stroke, compression-ratio, horsepower, peak-rpm, city-mpg, and highway-mpg.

Find the percent variability explained by principal components of these variables.

```[coeff,score,latent,tsquared,explained] = pca(X(:,3:15)); explained```
```explained = 13×1 64.3429 35.4484 0.1550 0.0379 0.0078 0.0048 0.0013 0.0011 0.0005 0.0002 ⋮ ```

The first three components explain 99.95% of all variability.

Visualize the data representation in the space of the first three principal components.

```scatter3(score(:,1),score(:,2),score(:,3)) axis equal xlabel('1st Principal Component') ylabel('2nd Principal Component') zlabel('3rd Principal Component')``` The data shows the largest variability along the first principal component axis. This is the largest possible variance among all possible choices of the first axis. The variability along the second principal component axis is the largest among all possible remaining choices of the second axis. The third principal component axis has the third largest variability, which is significantly smaller than the variability along the second principal component axis. The fourth through thirteenth principal component axes are not worth inspecting, because they explain only 0.05% of all variability in the data.

To skip any of the outputs, you can use `~` instead in the corresponding element. For example, if you don’t want to get the T-squared values, specify

`[coeff,score,latent,~,explained] = pca(X(:,3:15));`

## Argumentos de entrada

contraer todo

Input data for which to compute the principal components, specified as an n-by-p matrix. Rows of `X` correspond to observations and columns to variables.

Tipos de datos: `single` | `double`

### Argumentos de par nombre-valor

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Ejemplo: `'Algorithm','eig','Centered',false,'Rows','all','NumComponents',3` specifies that `pca` uses eigenvalue decomposition algorithm, not center the data, use all of the observations, and return only the first three principal components.

Principal component algorithm that `pca` uses to perform the principal component analysis, specified as the comma-separated pair consisting of `'Algorithm'` and one of the following.

ValueDescription
`'svd'`Default. Singular value decomposition (SVD) of `X`.
`'eig'`Eigenvalue decomposition (EIG) of the covariance matrix. The EIG algorithm is faster than SVD when the number of observations, n, exceeds the number of variables, p, but is less accurate because the condition number of the covariance is the square of the condition number of `X`.
`'als'`

Alternating least squares (ALS) algorithm. This algorithm finds the best rank-k approximation by factoring `X` into a n-by-k left factor matrix, L, and a p-by-k right factor matrix, R, where k is the number of principal components. The factorization uses an iterative method starting with random initial values.

ALS is designed to better handle missing values. It is preferable to pairwise deletion (`'Rows','pairwise'`) and deals with missing values without listwise deletion (`'Rows','complete'`). It can work well for data sets with a small percentage of missing data at random, but might not perform well on sparse data sets.

Ejemplo: `'Algorithm','eig'`

Indicator for centering the columns, specified as the comma-separated pair consisting of `'Centered'` and one of these logical expressions.

ValueDescription
`true`

Default. `pca` centers `X` by subtracting column means before computing singular value decomposition or eigenvalue decomposition. If `X` contains `NaN` missing values, `nanmean` is used to find the mean with any available data. You can reconstruct the centered data using `score*coeff'`.

`false`

In this case `pca` does not center the data. You can reconstruct the original data using `score*coeff'`.

Ejemplo: `'Centered',false`

Tipos de datos: `logical`

Indicator for the economy size output when the degrees of freedom, d, is smaller than the number of variables, p, specified as the comma-separated pair consisting of `'Economy'` and one of these logical expressions.

ValueDescription
`true`

Default. `pca` returns only the first d elements of `latent` and the corresponding columns of `coeff` and `score`.

This option can be significantly faster when the number of variables p is much larger than d.

`false`

`pca` returns all elements of `latent`. The columns of `coeff` and `score` corresponding to zero elements in `latent` are zeros.

Note that when d < p, `score(:,d+1:p)` and `latent(d+1:p)` are necessarily zero, and the columns of `coeff(:,d+1:p)` define directions that are orthogonal to `X`.

Ejemplo: `'Economy',false`

Tipos de datos: `logical`

Number of components requested, specified as the comma-separated pair consisting of `'NumComponents'` and a scalar integer k satisfying 0 < kp, where p is the number of original variables in `X`. When specified, `pca` returns the first k columns of `coeff` and `score`.

Ejemplo: `'NumComponents',3`

Tipos de datos: `single` | `double`

Action to take for `NaN` values in the data matrix `X`, specified as the comma-separated pair consisting of `'Rows'` and one of the following.

ValueDescription
`'complete'`

Default. Observations with `NaN` values are removed before calculation. Rows of `NaN`s are reinserted into `score` and `tsquared` at the corresponding locations.

`'pairwise'`

This option only applies when the algorithm is `'eig'`. If you don’t specify the algorithm along with `'pairwise'`, then `pca` sets it to `'eig'`. If you specify `'svd'` as the algorithm, along with the option `'Rows','pairwise'`, then `pca` returns a warning message, sets the algorithm to `'eig'` and continues.

When you specify the `'Rows','pairwise'` option, `pca` computes the (i,j) element of the covariance matrix using the rows with no `NaN` values in the columns i or j of `X`.

Note that the resulting covariance matrix might not be positive definite. In that case, `pca` terminates with an error message.

`'all'`

`X` is expected to have no missing values. `pca` uses all of the data and terminates if any `NaN` value is found.

Ejemplo: `'Rows','pairwise'`

Observation weights, specified as the comma-separated pair consisting of `'Weights'` and a vector of length n containing all positive elements.

Tipos de datos: `single` | `double`

Variable weights, specified as the comma-separated pair consisting of `'VariableWeights'` and one of the following.

ValueDescription

row vector

Vector of length p containing all positive elements.

`'variance'`

The variable weights are the inverse of sample variance. If you also assign weights to observations using `'Weights'`, then the variable weights become the inverse of weighted sample variance.

If `'Centered'` is set to `true` at the same time, the data matrix `X` is centered and standardized. In this case, `pca` returns the principal components based on the correlation matrix.

Ejemplo: `'VariableWeights','variance'`

Tipos de datos: `single` | `double` | `char` | `string`

Initial value for the coefficient matrix `coeff`, specified as the comma-separated pair consisting of `'Coeff0'` and a p-by-k matrix, where p is the number of variables, and k is the number of principal components requested.

### Nota

You can use this name-value pair only when `'algorithm'` is `'als'`.

Tipos de datos: `single` | `double`

Initial value for scores matrix `score`, specified as a comma-separated pair consisting of `'Score0'` and an n-by-k matrix, where n is the number of observations and k is the number of principal components requested.

### Nota

You can use this name-value pair only when `'algorithm'` is `'als'`.

Tipos de datos: `single` | `double`

Options for the iterations, specified as a comma-separated pair consisting of `'Options'` and a structure created by the `statset` function. `pca` uses the following fields in the options structure.

Field NameDescription
`'Display'`Level of display output. Choices are `'off'`, `'final'`, and `'iter'`.
`'MaxIter'`Maximum number steps allowed. The default is 1000. Unlike in optimization settings, reaching the `MaxIter` value is regarded as convergence.
`'TolFun'`Positive number giving the termination tolerance for the cost function. The default is 1e-6.
`'TolX'`Positive number giving the convergence threshold for the relative change in the elements of the left and right factor matrices, L and R, in the ALS algorithm. The default is 1e-6.

### Nota

You can use this name-value pair only when `'algorithm'` is `'als'`.

You can change the values of these fields and specify the new structure in `pca` using the `'Options'` name-value pair argument.

Ejemplo: ```opt = statset('pca'); opt.MaxIter = 2000; coeff = pca(X,'Options',opt);```

Tipos de datos: `struct`

## Output Arguments

contraer todo

Principal component coefficients, returned as a p-by-p matrix. Each column of `coeff` contains coefficients for one principal component. The columns are in the order of descending component variance, `latent`.

Principal component scores, returned as a matrix. Rows of `score` correspond to observations, and columns to components.

Principal component variances, that is the eigenvalues of the covariance matrix of `X`, returned as a column vector.

Hotelling’s T-Squared Statistic, which is the sum of squares of the standardized scores for each observation, returned as a column vector.

Percentage of the total variance explained by each principal component, returned as a column vector.

Estimated means of the variables in `X`, returned as a row vector when `Centered` is set to `true`. When `Centered` is `false`, the software does not compute the means and returns a vector of zeros.

## Más acerca de

contraer todo

### Hotelling’s T-Squared Statistic

Hotelling’s T-squared statistic is a statistical measure of the multivariate distance of each observation from the center of the data set.

Even when you request fewer components than the number of variables, `pca` uses all principal components to compute the T-squared statistic (computes it in the full space). If you want the T-squared statistic in the reduced or the discarded space, do one of the following:

• For the T-squared statistic in the reduced space, use `mahal(score,score)`.

• For the T-squared statistic in the discarded space, first compute the T-squared statistic using ```[coeff,score,latent,tsquared] = pca(X,'NumComponents',k,...)```, compute the T-squared statistic in the reduced space using `tsqreduced = mahal(score,score)`, and then take the difference: `tsquared` - `tsqreduced`.

### Degrees of Freedom

The degrees of freedom, d, is equal to n – 1, if data is centered and n otherwise, where:

• n is the number of rows without any `NaN`s if you use `'Rows','complete'`.

• n is the number of rows without any `NaN`s in the column pair that has the maximum number of rows without `NaN`s if you use `'Rows','pairwise'`.

### Variable Weights

Note that when variable weights are used, the coefficient matrix is not orthonormal. Suppose the variable weights vector you used is called `varwei`, and the principal component coefficients vector `pca` returned is `wcoeff`. You can then calculate the orthonormal coefficients using the transformation `diag(sqrt(varwei))*wcoeff`.

 Jolliffe, I. T. Principal Component Analysis. 2nd ed., Springer, 2002.

 Krzanowski, W. J. Principles of Multivariate Analysis. Oxford University Press, 1988.

 Seber, G. A. F. Multivariate Observations. Wiley, 1984.

 Jackson, J. E. A. User's Guide to Principal Components. Wiley, 1988.

 Roweis, S. “EM Algorithms for PCA and SPCA.” In Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems. Vol.10 (NIPS 1997), Cambridge, MA, USA: MIT Press, 1998, pp. 626–632.

 Ilin, A., and T. Raiko. “Practical Approaches to Principal Component Analysis in the Presence of Missing Values.” J. Mach. Learn. Res.. Vol. 11, August 2010, pp. 1957–2000.

Download ebook