If the definition of covariance is (x-mean(x))'*(x-mean(x)), why cov(x) does not return the same result? Thank you.

The 'cov' function normalizes by dividing by N-1 where N is the number of observations, which in this case is the number of rows in your matrix x.

cov computation in Matlab

Bi Bu el 27 de Oct. de 2017

Thanks, that means that Matlab by default uses the sample covariance (n-1). Is this correct?

Roger Stafford el 27 de Oct. de 2017

Yes, by default it divides by the number of samples minus one except in case of one sample (heaven forbid!) it divides by 1.

Bi Bu el 27 de Oct. de 2017

Thank you!

Steven Lord el 27 de Oct. de 2017

And if you want it to normalize by N instead of N-1 even when N > 1, specify the input argument named w in the documentation as 1 instead of omitting it or specifying it as 0.

Bi Bu el 27 de Oct. de 2017

Thanks, this is helpful.

Bi Bu el 28 de Oct. de 2017

Dear Steven, one more question popped up: does the "mean" function in Matlab have the option to divide by n or n-1? Because in the case of "cov", it is taking the expected value (mean) of the results by dividing by n-1 and not n. So if I wanted to write differently this formula, and use "mean" I wouldn't have the option to use n-1. Thanks.

Roger Stafford el 28 de Oct. de 2017

Abrir en MATLAB Online

@Bi Bu: It would make no sense dividing by n-1 in taking the mean. To get an unbiased estimate from the sum of n samples, one needs to divide by just n. That is, assuming subsequent samples each have the same expected value, then the sum of n of them will have an expected value of n times the expected value of any one of them, so such a sum should be divided by just n.

However, the definition of the covariance between two variables involves the mean of each of them. If one uses samples to estimate these means along with estimating their covariance, it can be shown by rather simple mathematics that a division by n-1 rather than n is necessary in the sum of products used to yield an unbiased estimate of the theoretical covariance. This is due to the expected deviation of these sample means from their true means. If you are interested in the mathematics involved, there are many such demonstrations on the internet. One such is located at:

https://www.youtube.com/watch?v=D1hgiAla3KI

Bi Bu el 28 de Oct. de 2017

Thank you so much! Great response. I will definitely watch the video. However, I can't see how the simple mean of a sample wouldn't be as biased as the means of the samples used to compute covariance. They are samples in both cases, after all.

Roger Stafford el 28 de Oct. de 2017

Editada: Roger Stafford el 28 de Oct. de 2017

@Bi Bu: No, the two expressions approximating the mean and the covariance are of a different nature. In the case of the mean the expression is a simple sum so that its expected value is simply the sum of the n separate means, and that certainly indicates the need to divide by n, not n-1 (where n is the number of terms). To divide by n-1 would be to give a biased estimate.

On the other hand, the expression for approximating the covariance is the sum of products, which in part depend on an approximation to the means of the two variables. It is this latter source of variation that has the effect of reducing, somewhat, the expected value of this expression, and results in a need to divide by the smaller n-1, not n. There is no such feature in the simple mean computation.

Remember, producing an unbiased estimate is defined as having the expected value of the approximation be precisely equal to the theoretical mean or covariance, so there is no choice in the matter in either case.

By the way, the website demonstration I mentioned above is actually concerned with the variance of one variable rather than the covariance of two variables. However, its argument is very similar to that needed for covariance, so it should serve to show the need for dividing by n-1 for the covariance computation. I would give the proof here, but I’m afraid it would take up quite a lot of space in this supposedly simple “answer”.

Bi Bu el 29 de Oct. de 2017

It would be great if you could write the answer. Is it too long?

cov computation in Matlab

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

10 comentarios
Mostrar 8 comentarios más antiguos Ocultar 8 comentarios más antiguos

Más respuestas (0)

Categorías

Etiquetas

Community Treasure Hunt

cov computation in Matlab

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

10 comentarios Mostrar 8 comentarios más antiguos Ocultar 8 comentarios más antiguos

Más respuestas (0)

Categorías

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

10 comentarios
Mostrar 8 comentarios más antiguos Ocultar 8 comentarios más antiguos