If the definition of covariance is (x-mean(x))'*(x-mean(x)), why cov(x) does not return the same result? Thank you.

 Respuesta aceptada

Roger Stafford
Roger Stafford el 27 de Oct. de 2017

0 votos

The 'cov' function normalizes by dividing by N-1 where N is the number of observations, which in this case is the number of rows in your matrix x.

10 comentarios

Bi Bu
Bi Bu el 27 de Oct. de 2017
Thanks, that means that Matlab by default uses the sample covariance (n-1). Is this correct?
Roger Stafford
Roger Stafford el 27 de Oct. de 2017
Yes, by default it divides by the number of samples minus one except in case of one sample (heaven forbid!) it divides by 1.
Bi Bu
Bi Bu el 27 de Oct. de 2017
Thank you!
Steven Lord
Steven Lord el 27 de Oct. de 2017
And if you want it to normalize by N instead of N-1 even when N > 1, specify the input argument named w in the documentation as 1 instead of omitting it or specifying it as 0.
Bi Bu
Bi Bu el 27 de Oct. de 2017
Thanks, this is helpful.
Bi Bu
Bi Bu el 28 de Oct. de 2017
Dear Steven, one more question popped up: does the "mean" function in Matlab have the option to divide by n or n-1? Because in the case of "cov", it is taking the expected value (mean) of the results by dividing by n-1 and not n. So if I wanted to write differently this formula, and use "mean" I wouldn't have the option to use n-1. Thanks.
Roger Stafford
Roger Stafford el 28 de Oct. de 2017
@Bi Bu: It would make no sense dividing by n-1 in taking the mean. To get an unbiased estimate from the sum of n samples, one needs to divide by just n. That is, assuming subsequent samples each have the same expected value, then the sum of n of them will have an expected value of n times the expected value of any one of them, so such a sum should be divided by just n.
However, the definition of the covariance between two variables involves the mean of each of them. If one uses samples to estimate these means along with estimating their covariance, it can be shown by rather simple mathematics that a division by n-1 rather than n is necessary in the sum of products used to yield an unbiased estimate of the theoretical covariance. This is due to the expected deviation of these sample means from their true means. If you are interested in the mathematics involved, there are many such demonstrations on the internet. One such is located at:
https://www.youtube.com/watch?v=D1hgiAla3KI
Bi Bu
Bi Bu el 28 de Oct. de 2017
Thank you so much! Great response. I will definitely watch the video. However, I can't see how the simple mean of a sample wouldn't be as biased as the means of the samples used to compute covariance. They are samples in both cases, after all.
Roger Stafford
Roger Stafford el 28 de Oct. de 2017
Editada: Roger Stafford el 28 de Oct. de 2017
@Bi Bu: No, the two expressions approximating the mean and the covariance are of a different nature. In the case of the mean the expression is a simple sum so that its expected value is simply the sum of the n separate means, and that certainly indicates the need to divide by n, not n-1 (where n is the number of terms). To divide by n-1 would be to give a biased estimate.
On the other hand, the expression for approximating the covariance is the sum of products, which in part depend on an approximation to the means of the two variables. It is this latter source of variation that has the effect of reducing, somewhat, the expected value of this expression, and results in a need to divide by the smaller n-1, not n. There is no such feature in the simple mean computation.
Remember, producing an unbiased estimate is defined as having the expected value of the approximation be precisely equal to the theoretical mean or covariance, so there is no choice in the matter in either case.
By the way, the website demonstration I mentioned above is actually concerned with the variance of one variable rather than the covariance of two variables. However, its argument is very similar to that needed for covariance, so it should serve to show the need for dividing by n-1 for the covariance computation. I would give the proof here, but I’m afraid it would take up quite a lot of space in this supposedly simple “answer”.
Bi Bu
Bi Bu el 29 de Oct. de 2017
It would be great if you could write the answer. Is it too long?

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Creating and Concatenating Matrices en Centro de ayuda y File Exchange.

Etiquetas

Preguntada:

el 27 de Oct. de 2017

Comentada:

el 29 de Oct. de 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by