confidence intervals returned by predict()

54 visualizaciones (últimos 30 días)
William Rose
William Rose el 27 de Jun. de 2023
Comentada: partika partikasiwatch el 4 de En. de 2024
The predict() function returns confidence intervals (CIs) for values predicted from a model. There are four options available for the CIs. Two of the options do not give the CIs I expect. Can someone explain these unexpected results? Are my expectaitons wrong or is the function wrong? I will give examples, using a simple linear regression model, and I will explain what values I expect. I'm sorry this is a long post, but I did not have time to make it shorter.
Create some data and make a simple linear regression model:
x=(5:15)';
b0=0; b1=1; sigma=1; %b0=intercept, b1=slope, sigma=s.d. of random noise
y=b0+b1*x+sigma*randn(size(x));
mdl=fitlm(x,y); % model using x, y
Make predictions with confidence intervals (four options for CIs)
xnew=(0:20)';
[~,yci1] =predict(mdl,xnew,'Prediction','curve', 'Simultaneous',false);
[~,yci2] =predict(mdl,xnew,'Prediction','curve', 'Simultaneous',true);
[~,yci3] =predict(mdl,xnew,'Prediction','observation','Simultaneous',false);
[ypred,yci4]=predict(mdl,xnew,'Prediction','observation','Simultaneous',true);
Plot predictions and confidence intervals
figure
subplot(211)
plot(x,y,'k*',xnew,ypred,'-k.'); hold on
plot(xnew,yci1(:,1),'-r',xnew,yci2(:,1),'-g',xnew,yci3(:,1),'-b',xnew,yci4(:,1),'-m');
plot(xnew,yci1(:,2),'-r',xnew,yci2(:,2),'-g',xnew,yci3(:,2),'-b',xnew,yci4(:,2),'-m');
legend('Data','Prediction','curve,non-simul','curve,simul.','obs.,non-simul','obs.,simul.')
ylabel('Y'); grid on
subplot(212)
plot(xnew,yci1(:,2)-ypred,'-r',xnew,yci2(:,2)-ypred,'-g',...
xnew,yci3(:,2)-ypred,'-b',xnew,yci4(:,2)-ypred,'-m');
legend('curve,non-simul','curve,simul.','obs.,non-simul','obs.,simul.')
xlabel('X'); ylabel('C.I. Half-width'); grid on
I wish the Matlab help epxlained the following, which took me some work to figure out: The four different CIs returned by predict() follow the general formula
where SE varies depending on the 'Prediction' option, and c varies depending on the 'Simultaneous' option.
When predict() is called with 'Prediction','curve', SE is given by
where
When predict() is called with 'Prediction','observation', SE is given by
When predict() is called with 'Simultaneous',false, c (for simple linear regression) is given by
where p is the CI probability, 0.95 by default. The critical value of the t statistic can be obtained in Matlab with c=tinv((1+p)/2,n-2). In the example here, p=0.95 and n=11, therefore c=tinv(.975,9)=2.2622. The formulas above produce CIs that agree with the CIs of predict(), when Simultaneous is false. These CIs are plotted in red and blue above.
When Simultaneous is true, the results are not what I expect. I expect the CIs (which, according to the Matlab Help, are by Scheffe's method) to be (see here and here; these sources use different notation, but they appear to agree):
where d is the number of independent new x values for simultaneous prediction. In the examples plotted above, d=21, because length(xnew)=21. Therefore we expect c=sqrt(21*finv(.95,21,9))=7.8391. Therefore we expect the CI widths to be wider by a uniform factor of 7.84/2.26=3.47, when Simultaneous is true. But the CIs are only wider by a factor of 1.2898. (The ratio of CI widths is the same when 'Prediction','observation' is used.) Why the discrepancy?
The confidence interval, when predicting a single value with 'Simultaneous',true , is also not what we expect. When predicting a single value, d=1, and c simplifies to . , where p is the CI probability. This is identical to the non-simultaneous confidence interval, , due to the relationship between F and t distributions. It makes sense that the simultaneous and non-simultaneous CIs would be the same when there is only one value being predicted "simultaneously". But the CIs returned by predict() are not the same, when one value is being predicted. See example below.
xnew=10;
[ypred1,yci1]=predict(mdl,xnew,'Prediction','curve','Simultaneous',false);
[ypred2,yci2]=predict(mdl,xnew,'Prediction','curve','Simultaneous',true);
fprintf('CI, non-simultaneous: %.2f to %.2f; half-width %.2f\n',yci1,yci1(2)-ypred1)
CI, non-simultaneous: 9.17 to 10.89; half-width 0.86
fprintf('CI, simultaneous: %.2f to %.2f; half-width %.2f\n',yci2,yci2(2)-ypred2)
CI, simultaneous: 8.93 to 11.14; half-width 1.11
Why are the CIs not the same?

Respuesta aceptada

ProblemSolver
ProblemSolver el 27 de Jun. de 2023
Editada: ProblemSolver el 27 de Jun. de 2023
Hello William,
The discrepancies you observe in the confidence intervals returned by the predict function can be attributed to the different methods employed for simultaneous prediction and the specific case of predicting a single value.
When the 'Simultaneous' option is set to true, the predict function calculates the confidence intervals using Scheffe's method, which assumes that all predicted values are correlated and adjusts the intervals accordingly. However, for the specific case of predicting a single value, the correlation among predictions is not applicable, leading to different results between simultaneous and non-simultaneous predictions.
In the case of simultaneous prediction with multiple values, the factor by which the confidence interval is widened compared to the non-simultaneous case depends on the number of independent new x values (denoted as d in your explanation). The formula you provided,
is correct for calculating the scaling factor of the confidence interval width.
Regarding the case of predicting a single value, the simultaneous and non-simultaneous confidence intervals should indeed be the same since there is no correlation among predictions. However, the predict function in MATLAB calculates the confidence intervals differently for the two cases, resulting in discrepancies.
To obtain the expected confidence intervals for the simultaneous prediction of a single value, you can manually calculate the non-simultaneous confidence interval using the formula you mentioned:
CI = ypred ± tinv((1+p)/2, n-2) * SE,
where SE is calculated based on the 'Prediction' option.
I hope this helps.
  7 comentarios
William Rose
William Rose el 2 de En. de 2024
You're welcome. I predict that the CIs obtained with predint(), with appropriate options for intopt and simopt, will match the CIs from predict(), with corresponding options for Prediction and Simultaneous. You will have to investigate to be sure.
partika partikasiwatch
partika partikasiwatch el 4 de En. de 2024
Yeah i Checked and CI values match for both functions with right options for prediction and simultaneous.
Again, thanks for answer.

Iniciar sesión para comentar.

Más respuestas (0)

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by