regstats The design matrix has more predictor variables than observations.

3 visualizaciones (últimos 30 días)
I used the following code to run a regression, the system shows
Error using regstats (line 132)
The design matrix has more predictor variables than observations.
My codes:
fm_betas=NaN(length(ud),4); % 4 columns for the constant term, size, bm, pe
for i=1:length(ud) % We run a regression for each time period
tdata=data_crsp(data_crsp(:,c.date)==ud(i),:);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
reg_results = regstats(tdata(:,c.fut_ret), [log(tdata(:,c.cap)), log(tdata(:,c.bm)), tdata(:,c.pe)], 'linear', {'beta'});
fm_betas(i,:)=reg_results.beta';
end
mean(fm_betas)
% ud=unique(data_crsp(:,c.date)); %data_crsp is the data set
% I have checked there is not infinite no. in the data

Respuestas (2)

dpb
dpb el 31 de Jul. de 2022
The problem is NOT that there are NaN or Inf in the data (although that could also be a cause since they're treated as missing values), the problem is as the error message says -- by the time you've selected the subset of data for one or more of your time periods, the resulting height(tdata) < 4, the number of coefficients you're trying to estimate (3 independent plus 1 intercept).
"You can't do that!" -- you'll have to only fit over periods that have at least that many points; it would be far better to have well more than that.
You'll have to dig into the data set and see where either your selection logic isn't doing what you think or find groupings that have sufficient data in them; we can't see the data...

Walter Roberson
Walter Roberson el 31 de Jul. de 2022
reg_results = regstats(tdata(:,c.fut_ret), [log(tdata(:,c.cap)), log(tdata(:,c.bm)), tdata(:,c.pe)], 'linear', {'beta'});
You are providing three prediction variables and one result variable, and you are not providing a type of model, so you default to linear. You are trying to find three linear coefficients, one for each of the three variables. Your calculation is effectively
[log(tdata(:,c.cap)), log(tdata(:,c.bm)), tdata(:,c.pe)] \ tdata(:,c.fut_ret)
In order to do that, you need at least three rows of input.
tdata=data_crsp(data_crsp(:,c.date)==ud(i),:);
What happpens if there are only 1 or 2 rows found by that test ?

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by