Numerical errors in corrcoef

1 visualización (últimos 30 días)
Wei Wang
Wei Wang el 8 de Ag. de 2019
Editada: Adam Danz el 14 de Jul. de 2020
I noticed that the output from corrcoef for a large matrix has some very tiny but non-zero numerical difference from the output I get by literally doing "for loops" to loop through all the columns myself. Is this a known issue? What would cause such difference? Here is a quick code I used to demonstrate this:
function matlab_corrcoef_issue
close all
A = randn(1000, 100);
R = corrcoef(A);
S = corrcoef_slow(A);
D = R-S;
plot(D(:));
return
function out = corrcoef_slow(M)
num_col = size(M,2);
out = zeros(num_col, num_col);
% This is to literally loop through
for i = 1:num_col
for j = i:num_col
r = corrcoef(M(:,i), M(:,j));
r = r(1,2);
out(i,j) = r;
out(j,i) = r;
end
end
return
  3 comentarios
Wei Wang
Wei Wang el 8 de Ag. de 2019
The corrceof_slow is in the code I pasted there, as well. YOu can copy and paste the whole thing into an open matlab new .m file and run it. Thank you!
Adam Danz
Adam Danz el 8 de Ag. de 2019
Oh jeez.... sorry. Not sure how I missed that. I think I tried to run each function individually.

Iniciar sesión para comentar.

Respuesta aceptada

Adam Danz
Adam Danz el 8 de Ag. de 2019
Editada: Adam Danz el 14 de Jul. de 2020
When you plot the results against each other (code below), the pairs all fall along the diagonal but with tiny aberrations that aren't perceptible unless you zoom in. The noise in your plot shows their differences which are, at maximum, max(D(:)) = 2.2204e-16, tiny.
figure();
R(R==1) = NaN;
S(S==1) = NaN;
plot(R,S,'o')
grid on
axis tight
axis equal
This is round-off error. A great article was written by a cofounder of MathWorks, Cleve Moler, that explains this problem. It's not that one set of results is more accurate than the other. Both sets of results have round-off error.
A deeper look:
To make your example more manageable I set the random number generator seed so we can get the same random matrix on each run. I also shortened your example matrix to 12x10 and added more variation.
rng(9999)
A = randn(12, 10) + (1:10);
You can determine which elements differ by executing
R = corrcoef(A);
S = corrcoef_slow(A); % from your question
R == S
That shows us that the 3rd pair of values (row 1, col 3) are the first to differ between the matrix and loop methods. So where does this difference start to occur in the algorithm?
With your data, one of the first things done in corrcoef() is to compute the covariance matrix using cov(). Already, the covariance results differ between the matrix and loop methods for the 3rd pair of data.
rng(9999)
A = randn(12, 10) + (1:10);
c1 = cov(A);
c2 = cov(A(:,[1,3]));
c1(1,3) - c2(1,2)
% ans =
% 5.5511e-17
If you dig deeper into the cov() function, the first line that produces a difference between the two methods is the line below where xc are your normalized data and denom is just the height of your data minus one. "xc" has the same values for both of your methods (though they are at different indices) but when you multiply xc by its transpose, that's when roundoff error starts to occur in both of your methods.
c = (xc' * xc) ./ denom;
  2 comentarios
Adam Danz
Adam Danz el 9 de Ag. de 2019
Wei Wang's answer moved here as a comment
Wonderful! Thank you so much for the explanation!
Adam Danz
Adam Danz el 9 de Ag. de 2019
Editada: Adam Danz el 12 de Ag. de 2019
I'm glad it was helpful

Iniciar sesión para comentar.

Más respuestas (0)

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by