How can I scale CDF normal distribution values to match actual data? Calculating R^2?

3 visualizaciones (últimos 30 días)
Hi everyone, How can I calculate R^2 for the actual data and the normal fit distribution? The problem I am having is my normal fit cdf values are on a scale of 0 to 1, and I would like to scale this so that is matches the scale of the actual data (0 to 2310). Because in the third to last step I must find the difference between the actual and normal predicted data.
Table = readtable("practice3.xlsx");
actual_values = Table.values;
actual_values = sort(actual_values)
actual_values = 10×1
50 80 350 370 450 700 1060 1100 2000 2310
hold on
cdfplot(actual_values); % Plot the empirical CDF
normalfit = fitdist(actual_values,'Normal'); % fit the normal distribution to the data
cdf_normal = cdf('Normal', actual_values, normalfit.mu, normalfit.sigma); % generate CDF values for each of the fitted distributions
plot(actual_values,cdf_normal) % plot the normal distribution
hold off
grid on
predicted_values = cdf_normal %HERE IS THE PROBLEM: cdf_normal ranges from 0 to 1, how can I scale cdf_normal to match the scale of the actual data, which has a max of 2310?
predicted_values = 10×1
0.1530 0.1623 0.2616 0.2701 0.3051 0.4251 0.6078 0.6274 0.9307 0.9699
% Compute R^2, which is 1 - (sum of squared residuals/total sum of squares)
SSR = sum(predicted_values - actual_values).^2;
TSS = sum(((actual_values - mean(actual_values)).^2));
Rsquared = 1 - SSR/TSS % Results in incorrect R value (R should be less than 1)
Rsquared = -12.1334

Respuestas (1)

Oguz Kaan Hancioglu
Oguz Kaan Hancioglu el 15 de Feb. de 2023
I think there is a problem in your calculation. Your calculation uses the x value of the actual values and F(x) value of the predicted values.
cdfplot(actual_values); % Plot the empirical CDF
cdfplot empirical CDF using your x-axis values. If you use the handle of the cdfplot you can access the F(x) value of your data. Change this as,
[h,stats] = cdfplot(actual_values); % Plot the empirical CDF
% don't close the cdfplot to use its handle
Fx = h.YData;
After you can use this Fx value in your your calculation.
% Compute R^2, which is 1 - (sum of squared residuals/total sum of squares)
SSR = sum(predicted_values - Fx).^2;
TSS = sum(((Fx - mean(Fx)).^2));
Rsquared = 1 - SSR/TSS % Results in incorrect R value (R should be less than 1)
  2 comentarios
Macy
Macy el 15 de Feb. de 2023
I could not get this too work, I am getting an array of 22 Rsquared values.
Oguz Kaan Hancioglu
Oguz Kaan Hancioglu el 15 de Feb. de 2023
That's caused by the cdfplot function. When you enter the actual_values into this function the cdfplot modifies the values of the actual_values and generates XData. You can examine h.Xdata. You will see that cdfplot writes the same element twice and adds -inf and +inf to your actual_values.
You can get your values ​​by manual indexing.
Fxx = Fx(2:2:20);
The vectors are the same length and correspond to the actual_values. Now you can calculate the R^2 as follow.
Fxx = Fx(2:2:20);
% Compute R^2, which is 1 - (sum of squared residuals/total sum of squares)
SSR = sum(predicted_values - Fxx).^2;
TSS = sum(((Fxx - mean(Fxx)).^2));
Rsquared = 1 - SSR/TSS % Results in incorrect R value (R should be less than 1)
I calculated 0.9450. It worked. However I don't know any idea why cdfplot use the same element twice.
Best regard

Iniciar sesión para comentar.

Productos


Versión

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by