Why does fitlm say my "design matrix is rank deficient to within machine precision" despite my design matrix being full rank?
9 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Here's some code to generate a design matrix similar to my own (but with made up data):
% Size of the one-hot encoding part of the design matrix
numRows = 1200;
numCols = 15;
% Create a [1200 * 15] matrix of zeros
designMatrix = zeros(numRows, numCols);
% Define the number of 1s needed in each column
onesPerColumn = numRows / numCols; % Should be 80 in this case
% Create a vector with the column indices repeated the correct number of times
columnIndices = repmat(1:numCols, 1, onesPerColumn);
% Shuffle the column indices to randomize the placement of 1s
randomizedIndices = columnIndices(randperm(length(columnIndices)));
% Assign 1s to the designMatrix based on the randomized indices
for i = 1:numRows
designMatrix(i, randomizedIndices(i)) = 1;
end
% Tack on last two columns to designMatrix to finalize design matrix
designMatrix = [designMatrix,rand(numRows,2)];
My [1200 * 17] design matrix is arranged as follows: columns 1:15 are a one-hot encoding of stimulus identity and columns 16:17 are two predictors that are continuous. Specifically, each row of my one-hot encoding (columns 1:15) contains a single 1 with the other 14 columns being 0s. Each stimulus is presented the same number of times across all observations such that
sum(designMatrix(:,1:15),1)
will be a [1*15] vector of 80 (because 1200/15=80).
Despite assessing that my designMatrix is full rank with the following:
rank(designMatrix)
I receive Warning: Regression design matrix is rank deficient to within machine precision when I run fitlm, such as:
% Make up values for dependent variable
y = rand(numRows,1);
% Fit the model using fitlm
mdl = fitlm(designMatrix,y)
Interestingly, fitrlinear is able to run everything without this warning...but I don't fully understand the difference between fitrlinear and fitlm.
Another interesting thing is that mdl = fitlm(designMatrix(:,2:end),y) doesn't produce the Warning. This make me think there's an issue with constructing one-hot encoding of stimulus identity this way. However, I'm unaware of a better alternative since I want a coefficient for each stimulus in the end.
What am I overlooking? Is there an issue with my one-hot encoding? Is there another way to run fitlm that is ideal? Thank you!
0 comentarios
Respuestas (1)
John D'Errico
el 4 de Sept. de 2024
Editada: John D'Errico
el 4 de Sept. de 2024
Funny. I was sure within a second of reading your question what you had done wrong. And I think you will probably say, Oh. Yeah. That makes sense. But what you needed to know was a little quirk about fitlm.
numRows = 1200;
numCols = 15;
% Create a [1200 * 15] matrix of zeros
designMatrix = zeros(numRows, numCols);
% Define the number of 1s needed in each column
onesPerColumn = numRows / numCols; % Should be 80 in this case
% Create a vector with the column indices repeated the correct number of times
columnIndices = repmat(1:numCols, 1, onesPerColumn);
% Shuffle the column indices to randomize the placement of 1s
randomizedIndices = columnIndices(randperm(length(columnIndices)));
% Assign 1s to the designMatrix based on the randomized indices
for i = 1:numRows
designMatrix(i, randomizedIndices(i)) = 1;
end
% Tack on last two columns to designMatrix to finalize design matrix
designMatrix = [designMatrix,rand(numRows,2)];
rank(designMatrix)
cond(designMatrix)
So not only is your design martrix full rank, it is well conditioned too! Where could the problem lie?
If you read the help for fitlm, you will see this:
'Intercept' true (default) to include a constant term in the
model, or false to omit it.
And therein lies your problem. fitlm automatically adds in a constant term in the model! It appends a column of ones to your matrix.
rank([ones(1200,1),designMatrix])
As you can see, the matrix now is singular, with still a rank unchanged at 17.
cond([ones(numRows,1),designMatrix])
Yep. The matrix is now numerically singular. You need to tell fitlm to not do that.
y = rand(numRows,1);
mdl = fitlm(designMatrix,y,intercept = false)
As far as why the matrix is always singular after appending the vector of ones, that stems from the way you designed this as a balanced experiment. Thus if you do this:
sum(designMatrix(:,1:15),2)
you will get a vector of all ones. And that insures fitlm will have a hissy fit, UNLESS you tell it not to append a constant term to the model. (Personally, I always felt that was the wrong design decision to have made, but I did not write the code or the specs for fitlm. I'm sure they had valid reasons.)
Oh, your last questions ...
1. Why does fitrlinear run when fitlm has a problem? fitrlinear does NOT decide to append a constant term to the model by default!
2. Why does dropping the first column of the design matrix allow it to run with no hissy fit? Because now that column of ones (that will be appended by default) is no longer representable as a simple sum of the columns of your matrix.
isequal(ones(numRows,1),sum(designMatrix(:,1:15),2))
isequal(ones(numRows,1),sum(designMatrix(:,2:15),2))
0 comentarios
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!