Calculating principal component scores from principal component coefficients of the new data

30 visualizaciones (últimos 30 días)
Hi all,
I perfomed a PCA on dataset using the function
[coeff,score,latent,~,explained,mu]=pca(TrainingSet.X);
Then I generated new shapes (in the cartesian space) using a reduced number of principal components. Now I need to the principal component scores for these new shapes, but I can't figure out how!
Based on the fact that the original centered training data can be retrieved using
centeredData= score*coeff'
I used the following statements, which did not generate relevant results.
for i= 1:newShapesNum
newShapeScore(i,:)=newShape(i,:)*pinv(coeff(:,1:shapeModesNum)'); % i is the counter of new (generated) observations.
newSvalid=newShapeScore(i,:)*coeff(:,1:shapeModesNum)';
end
UPDATE
I also tried running a pca analysis on the new instances, and requested [score] and [coeff]. The mean shape looked good but using the centeredData formula above did not regenerate the original shape! I don't understand why though..
I'd appreciate your help in finding the principal component scores for the new shapes.
Many thanks
Amin
  2 comentarios
Aditya Patil
Aditya Patil el 11 de Mayo de 2021
Can you elaborate on the issue? Are you trying to convert new data as per the pca transformation? Or is the issue that pca transformation of new data is leading to poor results?
Amin Kassab-Bachi
Amin Kassab-Bachi el 11 de Mayo de 2021
Thanks for responding. Actually I'm creating new instances with good quality. But it's my first time working with PCA so I'm not familiar with the terms. The new instances (in cartesian space) are created from randomly generated standard deviation values. I'm trying to recover their scores in principal component space because I need to correlate the scores to some output from another analysis later on. After many tests I finally got to the conclusion that scores are the standard deviation values I used. So for each principal component, for each new instance, I saved the generated SD [i.e. a random weight×sqrt(latent)]. Hopefully you can confirm this is correct.
Thanks

Iniciar sesión para comentar.

Respuesta aceptada

Aditya Patil
Aditya Patil el 12 de Mayo de 2021
To get the scores for new data, you need to first get the outputs mu and coeff.
X = rand(100, 5);
XTrain = X(1:75, :)
XTrain = 75×5
0.1441 0.3071 0.3775 0.8840 0.6683 0.8057 0.3544 0.5524 0.7381 0.9861 0.7959 0.0033 0.3544 0.6425 0.4665 0.9191 0.7689 0.0454 0.1116 0.5821 0.7176 0.1236 0.6015 0.8224 0.3409 0.2391 0.1492 0.9006 0.5579 0.6631 0.1738 0.4541 0.5185 0.6817 0.8653 0.6194 0.2851 0.5203 0.8938 0.2486 0.0550 0.3670 0.9562 0.1952 0.4238 0.2783 0.3371 0.4914 0.6739 0.2944
XTest = X(76:100,:)
XTest = 25×5
0.4050 0.8916 0.0311 0.9368 0.4693 0.4280 0.2849 0.0614 0.1172 0.3371 0.9347 0.9498 0.3593 0.3842 0.0361 0.6781 0.4363 0.2563 0.5025 0.2534 0.6973 0.2147 0.0580 0.2153 0.6004 0.9774 0.1824 0.5365 0.0387 0.3407 0.6281 0.8394 0.6062 0.0771 0.7966 0.1263 0.8900 0.5766 0.7521 0.1489 0.4293 0.8312 0.9448 0.5362 0.1901 0.4643 0.9553 0.6214 0.8245 0.4738
[coeff,scoreTrain,~,~,explained,mu] = pca(XTrain);
Now, to apply the same transformation, that is to get scores for new data, apply the following equation.
idx = 3; % Keep 3 principal components
scoreTest = (XTest-mu)*coeff(:,1:idx)
scoreTest = 25×3
0.1243 0.3578 0.3699 0.2510 -0.1932 -0.3583 0.5351 -0.2519 0.0646 0.1803 -0.2631 0.0597 0.3561 -0.1946 -0.0985 0.3395 -0.6057 -0.2079 0.3735 0.2247 -0.2527 -0.2488 0.1930 -0.0451 -0.1706 -0.0489 -0.1127 -0.0553 0.2642 0.2388
For more details, see the Apply PCA to New Data and Generate C/C++ Code documentation.
  1 comentario
Amin Kassab-Bachi
Amin Kassab-Bachi el 12 de Mayo de 2021
Editada: Amin Kassab-Bachi el 12 de Mayo de 2021
Thank you very much. This also confirmed what I calculated was correct. When testing my results previously I did not include mu, so the results did not look like anything useful! But now it's all starting to make more sense. Thanks.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Dimensionality Reduction and Feature Extraction en Help Center y File Exchange.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by