343 views (last 30 days)

I am updating my question as it seems before it was a bit confusing

I hope this is clear now (in case it is not clear, please ask and I will clarify):

Assume I have 16 random variables X1,..,X16 and 16 constant numbers C1,..,C16.

I want to generate samples of (X1+...+X16), considering that X1,..,X16 are random variables (1 to 8 are normally distributed and 9 to 16 are weibull) and the generated samples must always respect the following constraints:

X1/(X1+ ... +X16)=C1

...

X16/(X1 + ... + X16)=C16

where C1,...C16 are constant values and are known prior to the generation of the RVs.

and C1+...+C16 is always 1 and always positive numbers).

Thanks a lot for any suggestions

Bruno Luong
on 9 Jan 2021

Edited: Bruno Luong
on 9 Jan 2021

I don't quite understand what you want but from your request you seem to compute

X1 = C1 * S

X2 = C2 * S

...

X16 = C16 * S

With S being the sum of X_i. Therefore you just need to generate S (since all C are known).

If you know the pdf of X_i and the in principle you would be able to compute the theoretical pdf of S.

For example if X_i are independent and normal distribution with the same standard deviation of sigma, then S is normal distribution with standard deviation of sigma*sqrt(16).

Then all you need is generate randomly S from its pdf, then apply the above scalling then your are done.

If you are not able to compute the theoretical pdf of sum of mixed normal and weibull (admidly not trivial), the sum converges toward the normal distribution with

meanS := mean(S) = sum(mean(X_i))

stdS := std(S) = sqrt(sum(std(X_i)^2))

IMO if you generate S as

S = meanS + stdS*randn()

then

X = C * S

C being the array of 16 known values, it would well fit your need.

Paul
on 17 Jan 2021

Bruno,

Does this solution extend to the case where we remove the restrictions that Ci > 0 for the ratios involving the normal RVs?

For example, if X1 and X2 are normal and X3 and X4 are Weibull, does the same solution apply if:

C = [ -0.2 0.3 0.4 0.5]

Bruno Luong
on 17 Jan 2021

I think so. In the logic we never use the fact that C >= 0.

The parametrization migh extend to an unbounded interval. Numerically we just need to be careful and truncate at the places where the pdf is considered as neglegible.

Jeff Miller
on 3 Jan 2021

Edited: Jeff Miller
on 3 Jan 2021

It sounds like the only "free" aspect of your new samples is the new total (say, 241 instead of 240). Once you have that new total, your 16 individual random scores are in the pre-determined proportions of that new total.

So, generate new random totals in the same way as the original--i.e., as the sum of 16 separate random scores--and then compute the 16 individual new random variables as the pre-determined proportions of the new total.

Note that the 16 random variables generated in this way will no longer be normals and weibulls, however, since they are fixed proportions of a (total) random variable that is neither normal nor weibull.

By the way, you can speed up the generation of the total of 8 independent normal random variables, since their total is a single normal with the sum of the means and the sum of the variances.

Jeff Miller
on 4 Jan 2021

I am pretty sure that there is an easier way to do this. The key is that once you know the value of one rv, you can compute the values of the other rvs from the known weights. So, you only need to generate one rv randomly.

Let's say you want to generate random values of the first normal rv, from which you can compute the rest. As you have pointed out, the conditional distribution of this first rv is not actually normal because of the weighting constraint. What is its distribution?

To simplify the terminology, I'll pretend the rvs are all discrete so I can talk about probabilities.

As an example, what is pr(rv1=5), given the constraint? Well, rv1=5 under the constraint only when it is also the case that rv2...rv16 all equal their constrained values consistent with rv1=5. Call those values rv2c, rv3c,....

Now all these rvs are independent so the joint probability is the product of their individual probabilities. Thus, pr(rv1=5|constraint) = pr(rv1=5)*pr(rv2=rv2c)*pr(rv3=rv3c)...

In principle you can do the analogous calculation for all of the different values of rv1, and thus get the distribution of rv1 under the constraint. Once you have that, you can randomly sample rv1's from the constrained distribution and compute the corresponding values of the other rvs from each random rv1.

I must admit that I am not confident about how to extend all this to continuous rvs. Maybe it is enough to compute a joint likelihood by multiplying pdfs and applying some normalization at the end, but maybe it is trickier than that.

Paul
on 8 Jan 2021

Edited: Paul
on 13 Jan 2021

Let Xi be a set of independent, continuous random variables, i = 1-n.

Let wi be a set of weights, i = 2-n. The goal is find samples of Xi under the constraints Xi/sum(Xi) = wi, i = 2-n.

To solve this problem:

1. transform to a different set of random varibles defined as follows: Y1 = sum(Xi), Yi = Xi/sum(Xi) i = 2-n

2. find the joint pdf of Yi

3. Find the conditional pdf of Y1 given Yi = wi, i = 2-n.

4. sample Y1 from its conditional pdf

5. generate samples of Xi from Y1 and the constraint equations.

For example assume:

X1 ~ N(0,1)

X2 ~ N(1,2)

X3 ~ W(1,1)

X4 ~ W(2,1)

% Step 0

% define the distribution parameters

mu1 = 0; sigma1 = 1;

mu2 = 1; sigma2 = 2;

a3 = 1; b3 = 1;

a4 = 1; b4 = 2;

% define the pdfs of the random variables

syms x1 x2 real

syms x3 x4 real

f_x1(x1) = 1/sigma1/sqrt(2*pi)*exp(-(x1-mu1)^2/(2*sigma1^2)); % -inf < x1 < inf

f_x2(x2) = 1/sigma2/sqrt(2*pi)*exp(-(x2-mu2)^2/(2*sigma2^2)); % -inf < x2 < inf

f_x3(x3) = piecewise(x3 <= 0, 0, x3 > 0, (b3/a3)*((x3/a3)^(b3-1))*exp(-((x3/a3)^b3))); % -inf < x3 < inf

f_x4(x4) = piecewise(x4 <= 0, 0, x4 > 0, (b4/a4)*((x4/a4)^(b4-1))*exp(-((x4/a4)^b4))); % -inf < x4 < inf

% define the joint pdf of Xi

f_x1x2x3x4(x1,x2,x3,x4) = f_x1(x1)*f_x2(x2)*f_x3(x3)*f_x4(x4);

% Step 1

% define the transformation of RVs

syms y1 y2 y3 y4 real

g1 = y1 == x1 + x2 + x3 + x4;

g2 = y2 == x2/(x1 + x2 + x3 + x4);

g3 = y3 == x3/(x1 + x2 + x3 + x4);

g4 = y4 == x4/(x1 + x2 + x3 + x4);

% Step 2

% solve for the inverse equations

h = solve([g1,g2,g3,g4],x1,x2,x3,x4,'Real',true,'ReturnConditions',true);

% find the Jacobian of the inverse solution

J(y1,y2,y3,y4) = jacobian([h.x1, h.x2, h.x3, h.x4],[y1, y2, y3, y4]);

% determinant of J

detJ(y1,y2,y3,y4) = det(J(y1,y2,y3,y4));

% joint density of Yi (edit 12 Jan 2021, math should be abs(detJ). Did not change results.

f_y1y2y3y4(y1,y2,y3,y4) = f_x1x2x3x4(h.x1,h.x2,h.x3,h.x4)*abs(detJ(y1,y2,y3,y4));

% Step 3

% The conditional density of Y1 given Yi = yi is the joint density divided by the marginal density

% marginal density of Yi, i = 2-4

f_y2y3y4(y2,y3,y4) = int(f_y1y2y3y4(y1,y2,y3,y4),y1,-inf,inf);

% conditional density of Y1

f_y1giveny2y3y4(y1,y2,y3,y4) = f_y1y2y3y4(y1,y2,y3,y4)/f_y2y3y4(y2,y3,y4);

% verify conditional density using Jeff Miller's acceptance approach (https://www.mathworks.com/matlabcentral/answers/707693-generation-of-conditional-random-variables#comment_1242578)

simN = 5000000;

X1 = random('Normal',mu1,sigma1,[simN 1]);

X2 = random('Normal',mu2,sigma2,[simN 1]);

X3 = random('Weibull',a3,b3,[simN 1]);

X4 = random('Weibull',a4,b4,[simN 1]);

w2 = 0.3;

w3 = 0.5;

w4 = 0.7;

Y1 = X1 + X2 + X3 + X4;

Y2 = X2./(X1 + X2 + X3 + X4);

Y3 = X3./(X1 + X2 + X3 + X4);

Y4 = X4./(X1 + X2 + X3 + X4);

tolerance = 0.1;

accepted = abs(Y2 - w2) < tolerance & abs(Y3 - w3) < tolerance & abs(Y4 - w4) < tolerance;

figure;

subplot(311);

histogram(Y1(accepted),'Normalization','pdf');

hold on

fplot(f_y1giveny2y3y4(y1,w2,w3,w4),[0 5],'r','LineWidth',4)

title('f(y1 | y2,y3,y4), Analytic and Histogram (Acceptance)')

% Step 4

% sample Y1 using inverse sampling (numerical solution)

syms z real

f(z) = vpa(f_y1giveny2y3y4(z,w2,w3,w4));

Cdf_y1(y1) = int(f(z),-inf,y1);

y1vec = 0:.001:5; % Cdf_y1(0) = 0, Cdf_y1(5) = 0.999998034

Cvec = double(Cdf_y1(y1vec));

usamp = rand(simN,1);

Y1samp = interp1(Cvec,y1vec,usamp);

Y2samp = w2;

Y3samp = w3;

Y4samp = w4;

subplot(312)

histogram(Y1samp,'Normalization','pdf');

hold on

fplot(f_y1giveny2y3y4(y1,w2,w3,w4),[0 5],'r','LineWidth',4)

title('f(y1 | y2,y3,y4), Analytic and Histogram (Direct Inverse Sampling)')

% Step 5

% generate samples of Xi from Yi and the constraint equations

h1 = matlabFunction(h.x1);

X1samp = h1(Y1samp,Y2samp,Y3samp,Y4samp);

X2samp = w2*Y1samp;

X3samp = w3*Y1samp;

X4samp = w4*Y1samp;

subplot(313)

histogram(X1samp,'Normalization','pdf');

title('Conditional PDF of X1');

Here are the results. Seemed to work pretty well for this example. Note that Step 5 runs very fast once Steps 1-4 complete.

I don't know how well this approach scales as the number of RVs increases. There may be other gotcha's as well.

Bruno Luong
on 11 Jan 2021

@Paul: why the weight w1, w2, w3, w4 in your example do not sum to 1 ? Do it matter in the result?

Paul
on 12 Jan 2021

They did not sum to 1 because when I wrote the answer I did not realize that was a constraint. However, I've changed the values to where the do sum to 1 and it didn't change the result.

Edit: Actually, they did sum to 1, with w1 implicitly being set to -0.5, because we must have sum(w) = 1.

Paul
on 3 Feb 2021

Here is updated code that is simpler and more flexible.

mu = [0 1]; sigma = [1 2];

a = [1 1]; b = [1 2];

% define the constants.

C = [ 0.1; 0.2; 0.3; 0.4]; % column vector

nG = numel(mu);

nW = numel(a);

C = C(:)./sum(C);

%Ccell = num2cell(C);

% number of samples

simN = 5000000;

xvar = sym('x',[1 nG+nW],'real');

yvar = sym('y',[1 nG+nW],'real');

syms pdf_normal(x,muparam,sigmaparam)

pdf_normal(x,muparam,sigmaparam)= 1/sigmaparam/sqrt(2*pi)*exp(-(x-muparam)^2/(2*sigmaparam^2)); % -inf < x1 < inf

syms pdf_weibull(x,aparam,bparam)

pdf_weibull(x,aparam,bparam) = piecewise(x < 0, 0, x >= 0, (bparam/aparam)*((x/aparam)^(bparam-1))*exp(-((x/aparam)^bparam)));

% define the joint pdf of Xi

syms jointpdf_X real

jointpdf_X = 1;

for ii = 1:nG

jointpdf_X = jointpdf_X*pdf_normal(xvar(ii),mu(ii),sigma(ii));

end

for ii = 1:nW

jointpdf_X = jointpdf_X*pdf_weibull(xvar(ii+nG),a(ii),b(ii));

end

jointpdf_X = piecewise(sum(xvar)==0,0,jointpdf_X); % need to exclude the line sum(xi) = 0

jointpdf_X(xvar) = jointpdf_X;

% determinant of Jacobian

detJac(yvar) = yvar(1)^(nG+nW-1);

% joint density of Y, evaluated at Xi = Y1*Ci

Xi = num2cell(yvar(1)*C);

jointpdf_Y(yvar(1)) = simplify(jointpdf_X(Xi{:})*abs(detJac));

jointpdf_Y(yvar(1)) = vpa(jointpdf_Y);

% marginal density of Y2-Yn, evaluated at Xi = Y1*Ci, i = 2-n

marginalpdf_Y = vpaintegral(jointpdf_Y(yvar(1)),yvar(1),-inf,inf);

% conditional density of Y1 (only valid when the marginal density ~= 0)

conditionalpdf_Y1(yvar(1)) = jointpdf_Y/marginalpdf_Y;

conditionalpdf_Y1 = vpa(conditionalpdf_Y1);

% Step 4

% sample Y1 using inverse sampling (numerical solution)

syms z real

f(z) = vpa(conditionalpdf_Y1(z));

Cdf_y1(yvar(1)) = int(f(z),-inf,yvar(1));

% rough estimate of the maximum practical value of S = sum(Xi) (assumes S>0)

maxY1 = 0;

for ii = 1:nG

maxY1 = maxY1 + (mu(ii)+3*sigma(ii));

end

for ii = 1:nW

maxY1 = maxY1 + icdf('Weibull',0.999,a(ii),b(ii));

end

y1vec = 0:.01:maxY1;

% Cdfvec = double(Cdf_y1(y1vec));

Cdfvec = cumtrapz(y1vec,double(conditionalpdf_Y1(y1vec)));

[Cdfvec,ii] = unique(Cdfvec);

y1vec = y1vec(ii);

if abs(1-Cdfvec(end)) > 1e-5

error('CDF(end) ~= 1');

end

Y1 = interp1(Cdfvec,y1vec,rand(simN,1));

This code yields the same results as in the original answer. It works for the original question as defined with Ci > 0, and also works for other admissible values of Ci, i.e., sum(Ci) = 1 and the Ci associated with the Weibull variables all have to have the same sign, as long as the determination of maxY1 is properly modified for the case where Y1 = sum(Xi) is negative. But the important term is conditionalpdf_Y1, which I believe comes out correctly for the case where Y1 < 0 (admittedly having not tested this case extensively, but it seemed to work like it should).

It turns out that this solution is very similar to Bruno's solution in this comment. In fact, this term

jointpdf_X(Xi{:})

is exactly the same as this term

spdf_fun = @(s) prod(arrayfun(@(f,x) feval(f{1},x), pdftab, s*C));

in Bruno's code. The difference between the solutions is that this solution mutiplies jointpdf_X by the term

abs(detJac)) % == abs(y1^(nG+nW-1))

This extra term comes from the transformation of the differential volume in X space to the the differential volume in Y space, which is needed to define the joint pdf in Y space, which is needed to have before applying the conditioning to get the conditional pdf. At least in my understanding of these types of problems.

This solution can provide some insight into the symbolic form of the conditional pdf. If only a numerical answer is desired using this solution, it's probably easier to avoid the symbolic stuff and use Bruno's code but modfiy one line of code:

% spdf_fun = @(s) prod(arrayfun(@(f,x) feval(f{1},x), pdftab, s*C));

spdf_fun = @(s) prod(arrayfun(@(f,x) feval(f{1},x), pdftab, s*C)).*abs(s.^(nG+nW-1));

with the caveat that, as Bruno noted, Bruno's code will need minor updates to generalize to cases where one or more of Ci < 0 is allowed, should such generalization be desired.

The conditonal pdf in this solution will be proportional to

y1^(nG+nW-1+sum(b~=1))

which can be a pretty large exponent. For the orginal question it will be in the range [15 23] depending on the b parameters of the Weibull variables. So there may be some computational issues associated with raising numbers to a large power in intermeidate computations depending on the RV parameters if using only the numerical solution.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 2 Comments

## Direct link to this comment

https://es.mathworks.com/matlabcentral/answers/707693-generation-of-conditional-random-variables#comment_1242363

⋮## Direct link to this comment

https://es.mathworks.com/matlabcentral/answers/707693-generation-of-conditional-random-variables#comment_1242363

## Direct link to this comment

https://es.mathworks.com/matlabcentral/answers/707693-generation-of-conditional-random-variables#comment_1242383

⋮## Direct link to this comment

https://es.mathworks.com/matlabcentral/answers/707693-generation-of-conditional-random-variables#comment_1242383

Sign in to comment.