Regression analysis in Matlab

How can I fit a model to predict a response variable(y) for a set of regressor variables(i.e. x1, x2, x3, x4, x5, x6). Probably the model may or may not be linear one. The 'sample' of simulation data are:
x1=[263,268,273,278,283,288,293,298,303,308,313,318,263,268,273,278,283,288,293];
x2=[323,333,343,353,363,373,343,423,433,473,323,443,463,493,353,363,383,403,453];
x3[10,20,50,40,20,10,30,40,50,40,30,20,20,10,20,30,40,40,20];
x4[0.83,0.88,0.77,0.83,0.84,0.87,0.71,0.84,0.63,0.69,0.83,0.50,0.88,0.83,0.97,0.83,0.96,0.83,0.78];
x5[0.00101325,1.01325,0.000101325,0.101325,1.01325,0.000101325,0.101325,0.0101325,0.000101325,0.101325,0.0101325,0.000101325,0.101325,0.0101325,0.000101325,0.00101325,0.101325,1.01325,1.01325];
x6[0.05,0.06,0.06,0.07,0.08,0.07,0.09,0.1,0.06,0.05,0.04,0.08,0.09,0.1,0.07,0.06,0.06,0.08,0.05];
y=[257.98,262.99,268.05,273.17,278.35,283.59,288.9,294.29,299.75,305.3,310.93,316.64,258.22,263.23,268.29,273.4,278.58,283.82,289.12];
Please advice me.....
T. Aseri

1 comentario

dpb
dpb el 28 de Dic. de 2013
If have Statistics Toolbox, see
doc regress
W/O,
doc slash % NB: the backslash operator '\'

Iniciar sesión para comentar.

Respuestas (2)

dpb
dpb el 28 de Dic. de 2013
Editada: dpb el 29 de Dic. de 2013

0 votos

Now having Matlab open and convenient, to amplify on the above...
Stat Toolbox ...
>> b1=regress(y',[x1' x2' x3' x4' x5' x6'])'
b1 =
1.0102 -0.0005 -0.0090 -6.8343 -0.2722 -13.6140
Base Matlab backslash operator...
>> b2=[[x1' x2' x3' x4' x5' x6']\y']'
b2 =
1.0102 -0.0005 -0.0090 -6.8343 -0.2722 -13.6140
>>
Remarkable similarity, wot? :)
Now, as you might expect, the Toolbox solution has some more interesting outputs...
>> [b,bint,r]=regress(y',[x1' x2' x3' x4' x5' x6']);
>> [b bint]
ans =
1.0102 0.9968 1.0237
-0.0005 -0.0085 0.0075
-0.0090 -0.0404 0.0223
-6.8343 -9.6170 -4.0516
-0.2722 -1.2170 0.6726
-13.6140 -37.7253 10.4972
>> sqrt(sum(r.*r)/length(r))
ans =
0.6206
>> [b,bint,r]=regress(y',[x1' x2' x4']);
>> [b bint]
ans =
b =
1.0095 0.9980 1.0210
-0.0024 -0.0091 0.0043
-7.2257 -9.7197 -4.7316
>> sqrt(sum(r.*r)/length(r))
ans =
0.6663
>> [b,bint,r]=regress(y',[x1' x4']);
>> sqrt(sum(r.*r)/length(r))
ans =
0.6786
>>
Looking at the intervals on the estimated coefficients, only a few of the variables are significant and a much more parsimonious model is possible w/ essentially same SSe as with blindly including all six.
Your mission, should you choose to accept it, is to complete the analysis and judiciously choose the overall best model. I have not considered or looked at any interaction terms you'll note.
ADDENDUM:
Oversight--the above doesn't include the intercept term. Write the model as
b1=regress(y',[ones(size(x1')) x1' x2' x3' x4' x5' x6'])'
or similarly to include it.

4 comentarios

Aseri T
Aseri T el 29 de Dic. de 2013
Dear dpb, I am really grateful to you, I will check with your suggestion and get back to you soon. Thank you for your support.
T. Aseri
dpb
dpb el 29 de Dic. de 2013
BTW, if you do have the Statistics Toolbox, look at
doc regstats
that does much of the work of computing the ancillary statistics needed.
I do wish TMW would take the last step of providing a nicely formatted table as an option a la SAS or their ilk.
OBTW, NB: I neglected to included an intercept term in the preceding -- see the ADDENDUM to the previous answer. regstats handles this automagically but regress or the backslash operator need the model coded explicitly.
Aseri T
Aseri T el 31 de Dic. de 2013
Yes I do have statistics tool box and I am working on it. I need to first learn it then I am able to choose best fitted model with minimum regressor via performing all need tests. Thank you for your precious support, I'll be in touch with you.
Aseri T
Aseri T el 31 de Dic. de 2013
Here is the problem, I've entered all data in column format with equal no. of rows (6696):

Iniciar sesión para comentar.

dpb
dpb el 1 de En. de 2014
Editada: dpb el 1 de En. de 2014

0 votos

NB: you created a Matlab dataset object Datas (BTW, altho it doesn't matter to Matlab what a variable name is, "data" are plural from the Latin, the singular is a "datum" point--common US English use has corrupted this terribly) so you must reference the values by the use of the dot to reference the various variables.
Use
Datas.Properties.VarNames
to see the variable names in the Datas object; then you get the actual data by using
Datas.VarName
where "VarName" is the name for the particular variable. Assuming the Excel sheet has headings of the names you've used above, something like
X=[ones(length(Datas),1) Datas.Ta Datas.Tabs ... Datas.eabs];
would appear to be correct. If there are no headers, then the default variable name 'Var1' would have been assigned and it will be an array in which it's somewhat simpler to reference --
b=regress(Datas.Var1(:,7), [ones(length(Datas),1) Datas.Var1(:,1:6)]);
Again, note that you must specify the constant term in the model explicitly with regress
Since you say you have the Statistics Toolbox, I recommend reverting to regstats to get the additional statistics you'll want/need to evaluate the quality of the model directly.
See
doc dataset % and related for details on using the dataset object
Alternatively, of course, you could use one of the other methods of reading in the file ( xlsread comes to mind) and return the data into a base Matlab array which would obviate all the dataset stuff which may not be of much real use for your present purposes.

Categorías

Más información sobre Get Started with Curve Fitting Toolbox en Centro de ayuda y File Exchange.

Productos

Preguntada:

el 28 de Dic. de 2013

Editada:

dpb
el 1 de En. de 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by