171 views (last 30 days)

Show older comments

Chunru
on 29 May 2021

% Assume that real and imaginary parts are independent

% 1/sqrt(2) makes the variaonce to be 1

x = 1/sqrt(2)*(rand(N, 1) +1i*rand(N,1));

Walter Roberson
on 29 May 2021

I made a mistake in my testing earlier: I now agree with 1/sqrt(2) as the scaling.

Walter Roberson
on 29 May 2021

Edited: Walter Roberson
on 29 May 2021

There are two approaches to ensuring a sample mean of exactly 0, and a sample variance of exactly 1.

format long g

N = 5; % number of snapshots

x = 1/2*(randn(N, 1) +1i*randn(N,1))

xMean = mean(x)

xVariance = var(x)

First approach: modify the values as a group

x2 = x - xMean;

xMean2 = mean(x2) %verify it is exactly 0, to within round-off error

x2 = x2 ./ sqrt(xVariance);

xVariance = var(x2)

x2

The number of degrees of freedom of the original x is 10 -- N*2 independent random values.

The number of degrees of freedom of the modified group, x2, is only 8: although each of the values is still random, together as a group they are correlated in ways that reduces the freedom.

Second approach: take only initial entries and generate the remaining values as whatever is necessary to make the group fit. This is proving to be more difficult than I thought it would be.

Paul
on 2 Jun 2021

I think I understand what your professor is saying and where your confusion lies. For now, I'm going to assume that all the variables in question are real, not complex, which may make it easier to illustrate the concepts.

In this problem, the goal is to obtain L observations of a random variable vector, X, of length N, such that the L observations of X, denoted as x1 - xL (note the lower case x to distinguish observations from the random variable) are drawn from an N-dimensional normal distribution with a theoretical expected value of zeros(N,1) and theoretical covariance Rxx.

Let Z be a random variable vector of length N, with normal, N-dimensional probability density function with expected value Muz = zeros(N,1) and theoretical covariance Rzz = eye(N). Let X = E*sqrt(D)*Z, where E and D are the eigenvectors and diaganal matrix of eigenvalues of Rxx respectively. With this relationship between random variable vectors X and Z, X will have the desired properties: normal, Mux = zeros(N,1), and theoretical covariance Rxx.

So how do we obtain observations x1 - xL? The approach is to generate observations z1 - zL and the apply E and sqrt(D) to those observations. Then the observations xi will be as if they were obtained by sampling from a distribution of X.

As Walter showed, the observation of the so-called practical mean computed from the L observations z will not be zeros(n,1) and the observed practical covariance will not be eye(N). (As an aside, I've never heard that term "practical," but that's o.k.) Similarly, the observed value of the practical mean computed from observations x will not be zeros(n,1) and the observed value of the practical covariance Rhatxx will not equal Rxx. But as long as you're properly generating the observations z in the first place, the whole procedure is giving you what you want, and you should not modify any of the data to force the observed practical mean to be zero or the observed practical covariance to be Rxx. Doing so will corrupt the statistical properties of the data such that downstream calculations will be suspect.

Let's look at an example, with

N = 2;

Rxx = [1 .5;.5 2]; % theoretical covariance

Let's try the procedure using L = 10 observations of Z. Because the theoretical covarance of Z is Rzz = eye(N), we know that the elements of Z are indepdendent so we can generate Z with randn

L = 10;

rng(100);

z = randn(N,L); % L observations of an N-dimensional vector

Now generate the observations x of X

[E,D] = eig(Rxx);

x = E*sqrt(D)*z;

Now compute the observed practical mean of x, which is an estimate of the expected value of X, i.e., Mux

muhatx = mean(x,2)

As expected, the observed practical mean is not zeros(2,1). In fact, the practical mean itself is a random vector that has its own probability distribution.

Now compute the observed practical covariance Rhatxx, which is an estimate of Rxx. We can do this two ways: using Matlab's cov() function or using the equation in the notes you posted. Note that the outer product is scaled by L, the number of observations.

rhatxx1 = cov(x.') % transpose needed because cov wants the observations going down the rows

rhatxx2 = x*x'/L % scaled outer product

Neither of these are equal (or even close) to Rxx; again the elements of Rhatxx are themselves random variables with their own distributions. But why aren't the observed covariances equal to each other? There are two reasons. cov() subtracts muhatx from the observations, and cov() scales by L-1. Check:

rhatxx3 = (x-muhatx)*(x-muhatx)'/(L-1) % equals rhatxx1

I think this means that cov() isn't making any assumptions about the distribution of X, whereas the outer product is based on knowing that Mux = zeros(2,1). The point to take away is that the observations in x can be treated as if they were drawn from a distribution with theoretical mean Mux = zeros(2,1) and covariance Rxx, even if the observed mean muhatx and covariance rhatxx are different than the theoretical values. In fact, the probability that Rhatxx (Muhatx) will equal Rxx (Mux) is zero. To see this point more concretely, increase L to generate more observations. Doing so we expect muhatx and rhatxx to come closer to Mux = zeros(2,1) and Rxx, but the estimates will never be exact. But that's o.k. All you want to do is convince yourself that the observations x, however many there are, are properly viewed as being taken from the distribution of X, with parameters Mux and Rxx.

L = 10000;

z = randn(N,L); % L observations of an N-dimensional vector

x = E*sqrt(D)*z;

muhatx = mean(x,2)

rhatxx1 = cov(x.') % transpose needed because cov wants the observations going down the rows

rhatxx2 = x*x'/L % scaled outer product

rhatxx3 = (x-muhatx)*(x-muhatx)'/(L-1) % equals Rhatxx1

rhatxx3 - rhatxx1

Note that by increasing L we weren't guaranteed that rhatxx and muhatx would be better estimates as compared to L = 10, but that was very likely to be the case.

In summary: don't modify your observations in an attempt to enforce the theoretical mean and covariance on the observed values. Doing so will corrupt your statistical results if you run your model over and over, as in a Monte Carlo simulation.

Now, in your problem you're dealing with complex, N-dimensional random vectors. The only thing I'm not sure about is what the notes mean where Z is defined just above equation 12. I'm not sure there is enough information there to uniquely define the distribution of Z. Maybe that doesn't matter, i.e., any Z that satisfies eqn(12) will do. One possibility, as discussed in the other answer, is Z = Zr + 1i*Zi, where Zr and Zi are each N-dimensional, real, random vecotrs with N-dimensional, normal distribution with expected value mu = zeros(N,1) and covariance Rzz = eye(N)/2.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!