Explained variance for a dataset containing quantitative and qualitative data

24 visualizaciones (últimos 30 días)
Hi everbody,
I'm working on datasets containing both quantitative and qualitative data. Given a subset of data, I'm trying to determine the explained variance with regard to the original mixed dataset. I understand that in case of numerical data I could use:
[~,~,~,~,explained] = pca(X(:,3:15));
explained
However I'm bound to using mixed data. The subset of the original dataset is provided to me.
Is there any obvious solution I'm missing here? I might just be lacking expertise.
Thanks in advance!

Respuestas (1)

Vijeta
Vijeta el 2 de Mayo de 2023
Hi Banjamin,
When dealing with mixed data, you can use a technique called Multiple Correspondence Analysis (MCA) instead of PCA to analyze the data. MCA is a multivariate statistical technique that can handle mixed datasets consisting of both quantitative and qualitative variables. MCA is based on the calculation of a similarity matrix between the different categories of the qualitative variables, which is then used to calculate the principal components.
We can normalize the quantitative data using standardization, and perform MCA on the qualitative data using the pca function in MATLAB. We then combine the MCA and quantitative data into X_mca_quant, and perform PCA on the combined data using the pca function in MATLAB. Finally, we display the explained variance using the explained variable.
Note that in this example, we assume that the qualitative variables are categorical and do not have a natural ordering. If your qualitative variables have a natural ordering, you may need to convert them to numerical values before performing MCA.
  1 comentario
Benjamin Lender
Benjamin Lender el 4 de Mayo de 2023
Hallo Vijeta,
thank you for your answer! I'm currently using FAMD, which brings together MCA and PCA, to reduce the data. However, I'm transfering the results back to original variables, rather than using the new dimension. This is because of constraints of the application case.
Now at this point, I cannot use the "explained variance" feature anymore and am stuck with a subset of my original data, trying to determine which portion of original variance of the mixed dataset is explained by the subset.
Can you help me out here?
Thanks!!
Ben

Iniciar sesión para comentar.

Categorías

Más información sobre Dimensionality Reduction and Feature Extraction en Help Center y File Exchange.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by