Difference between regress function and basic fitting wizard

4 visualizaciones (últimos 30 días)
Victoria Dutch
Victoria Dutch el 21 de Nov. de 2023
Respondida: Sulaymon Eshkabilov el 21 de Nov. de 2023
I have 3-D matrixes of 2 variables, A and B. A contains measured values which are incomplete in space and time, and so has a large number of NaN values. B is a modelled value of A, with far fewer NaN values. I would like to have an equation in the form B = mA + c, where I can see what the predicted value of B would be for a given value of A. I have used the regress function, first converting each matrix to a column as so:
v = reshape(A,[],1);
onez = ones(length(v),1);
v_regress = horzcat(onez,v);
u = reshape(B,[],1);
[w,x,y,~,z] = regress(u,v_regress);
I have also made a scatter plot (using the scatter function) of my column versions of A and B (ie. v and u), and then applied a linear fit in the basic fitting toolbox. The resulting linear fit line looks wildly different (and much better) than plotting the line from the regress function. Additionally, the regress function gives a numerical R^2 value, and the fitting toolbox gives an R^2 value of NaN.
What does the linear fit option on basic fitting toolbox compute differently to the regress function?

Respuestas (1)

Sulaymon Eshkabilov
Sulaymon Eshkabilov el 21 de Nov. de 2023
They both produce the same results: R2 is the same for both. See e.g.:
A = randi([-13, 13], 20, 5);
B = randi([-130, 130], 20, 5);
IDXA_1 = randi(10, 7,1);
IDXA_2 = randi(5, 7,1);
for ii=1:numel(IDXA_1)
A(IDXA_1(ii), IDXA_2(ii)) = NaN; % A contains some NaNs
end
IDXB_1 = randi(10, 5,1);
IDXB_2 = randi(5, 5,1);
for ii=1:numel(IDXB_1)
B(IDXB_1(ii), IDXB_2(ii)) = NaN; % B contains some NaNs
end
v = reshape(A,[],1);
onez = ones(length(v),1);
v_regress = horzcat(onez,v);
u = reshape(B,[],1);
[w,x,y,~,z] = regress(u,v_regress);
R2=z(1);
disp(R2)
0.0171
MDL = fitlm(v_regress,u)
Warning: Regression design matrix is rank deficient to within machine precision.
MDL =
Linear regression model: y ~ 1 + x1 + x2 Estimated Coefficients: Estimate SE tStat pValue ________ ______ ______ _______ (Intercept) 0 0 NaN NaN x1 13.208 8.2415 1.6026 0.11269 x2 -1.2607 1.0258 -1.229 0.22242 Number of observations: 89, Error degrees of freedom: 87 Root Mean Squared Error: 77.4 R-squared: 0.0171, Adjusted R-Squared: 0.00577 F-statistic vs. constant model: 1.51, p-value = 0.222
% You may also consider removing NaNs
A = randi([-13, 13], 20, 5);
B = randi([-130, 130], 20, 5);
IDXA_1 = randi(10, 7,1);
IDXA_2 = randi(5, 7,1);
for ii=1:numel(IDXA_1)
A(IDXA_1(ii), IDXA_2(ii)) = NaN; % A contains some NaNs
end
AF = fillmissing(A,'movmedian',10); % NaNs in A are substituted with moving median of 10 points
IDXB_1 = randi(10, 5,1);
IDXB_2 = randi(5, 5,1);
for ii=1:numel(IDXB_1)
B(IDXB_1(ii), IDXB_2(ii)) = NaN; % B contains some NaNs
end
BF = fillmissing(B,'movmedian',10); % NaNs in B are substituted with moving median of 10 points
v = reshape(AF,[],1);
onez = ones(length(v),1);
v_regress = horzcat(onez,v);
u = reshape(BF,[],1);
[w,x,y,~,z] = regress(u,v_regress);
R2=z(1);
disp(R2)
0.0029
MDL = fitlm(v_regress,u)
Warning: Regression design matrix is rank deficient to within machine precision.
MDL =
Linear regression model: y ~ 1 + x1 + x2 Estimated Coefficients: Estimate SE tStat pValue ________ ______ _______ _______ (Intercept) 0 0 NaN NaN x1 4.408 7.7121 0.57157 0.56893 x2 0.53641 1.0037 0.53445 0.59425 Number of observations: 100, Error degrees of freedom: 98 Root Mean Squared Error: 77.1 R-squared: 0.00291, Adjusted R-Squared: -0.00727 F-statistic vs. constant model: 0.286, p-value = 0.594

Productos


Versión

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by