# nancov gives strange results

2 views (last 30 days)
Sagar on 8 Jul 2015
Answered: Brendan Hamm on 8 Jul 2015
I have a standardized matrix (aod_normalized) of a variable whose size is 3653*12, where 3653 is the no. of observations and 12 represents the no. of stations. There are many NaN's in the dataset(10-50% in different stations). Standard deviation of this matrix as calculated using nanstd is 1 for all the stations.
I calculated the covariance matrix of the data using: cov_matrix = nancov(aod_normalized)
I expected that the diagonal elements of the cov_matrix (of size 12*12) would all be equal to 1. However, the results show that all the diagonal elements are greater than 1. This looks little strange to me. I tried the same procedure with another dataset that does not contain any NaNs, in which case, the diagonal elements are all equal to 1 as expected. Could someone explain why I am getting this result?

Brendan Hamm on 8 Jul 2015
When you use nanstd each standard deviation is calculated only using the data for that column, so nanstd can ignore all NaN results for that variable. However, when you are computing the covariance, we need to use multiple variables for each correlation. For this reason, MATLAB by default removes any observations (rows) which have any NaNs in them prior to performing any calculations. This means that you are removing valid data points for the calculation of the variances of variables, because the corresponding observation of some other variable was NaN. To achieve the result you expect you would wan to use the 'pairwise flag'.
Y = nancov(...,'pairwise')
Note: Since we now have different dimensionality for each of our measurements, we are no longer guaranteed that the resulting covariance matrix is positive definite.