The proper way to sample 3 normally or lognormal distributed variables added up to 1

Question

Jesse Chao el 2 de En. de 2023

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1

Comentada: Paul el 5 de En. de 2023

Hello team,

Is there any way to generate 3 normally or lognormal distributed variables added up to 1?

For example, I have a human tissue with a volume as 1 L. And there are three components vascular space (R_vas), interstitial space (R_int), cellular space (R_cell) to composite this tissue. Thus, in this case, R_vas + R_int + R_cell = 1.

To further consider variation in different human indivisual, I would like to normally or lognormally sample these three parameters (R_vas, R_int and R_cell have a mean 0.1, 0.1, 0.8, and coefficient variation 0.3, 0.3, 0.3, respectively) but ensure that these three parameters add up to 1.

I don't think I could just randomly sample two of the parameters and use 1 to minus the sum of the two randomly smapled parameters to get the third one. In this case, I think the third parameter is not followed the predefined mean and coefficient variation. or is it?

How could I achieve this task?

It will be great to have your help.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

John D'Errico el 2 de En. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1#answer_1139822

Editada: John D'Errico el 2 de En. de 2023

Abrir en MATLAB Online

Let me answer your second question separately. How would you sample three NORMALLY distributed random variables that sum to 1? This part is easy, since it merely requires you to properly construct the necessary covariance matrix.

First, assume we have three variables, with means and standard deviations:

mu = [.1 .1 .8];
sigma = [.3 .3 .3];

But really, we need a covariance matrix that is not a diagonal one. I'll construct it as if I knew the eigenvalue decomposition.

Q = [null([1 1 1]),[1;1;1]/sqrt(3)]
Q = 3×3
   -0.5774   -0.5774    0.5774
    0.7887   -0.2113    0.5774
   -0.2113    0.7887    0.5774
C = Q*diag(3/2*[.3^2 .3^2 0])*Q'
C = 3×3
    0.0900   -0.0450   -0.0450
   -0.0450    0.0900   -0.0450
   -0.0450   -0.0450    0.0900

As you can see, the VARIANCES (so the sqrt of the diagonal elements) of the covariance matrix are each 0.3. But they are now correlated.

Now we can generate set of numbers that have the desired property, at least over the long term, and within floating point trash. What I did was to carefully construct a singular covariance matrix.

X = mvnrnd(mu,C,10000);

hist(sum(X,2))

mean(X)

ans = 1×3

0.1037 0.1011 0.7952

std(X)

ans = 1×3

0.2987 0.3059 0.2997

format long g

[min(sum(X,2)),max(sum(X,2))]

ans = 1×2

0.9999999675926 1.00000003250349

So to within floating point trash, they sum to 1. At least as close as we can come based on double precision arithmetic in what I did. They have the desired means and variances.

Note that these variables have a problem perhaps in what you are doing, in that some of those elements will often be negative.

hist(min(X,[],2))

That is a likely event when the sum is required to be 1, and you are asking about NORMALly distributed random variables. This is also a reason why your goal MUST fail for the lognormal case.

8 comentarios
Mostrar 6 comentarios más antiguosOcultar 6 comentarios más antiguos

Paul el 5 de En. de 2023

Abrir en MATLAB Online

I would attack the problem this way

N = [eye(2);[-1 -1]];

N forms a basis for the null space of [1 1 1]

[1 1 1]*N
ans = 1×2
     0     0

Define C as follows

syms sigma_x sigma_y rho_xy

C = N*[sigma_x^2 rho_xy*sigma_x*sigma_y;rho_xy*sigma_x*sigma_y sigma_y^2]*N.'

C =

We see that we can define the covariance of X and Y (upper 2x2 block), but we get what we get for the variance of Z and its correlation with the other two variables.

Example from @John D'Errico formed by

C = N*[.3^2 -0.5*0.3*.3;-0.5*.3*.3 .3^2]*N'
C = 3×3
    0.0900   -0.0450   -0.0450
   -0.0450    0.0900   -0.0450
   -0.0450   -0.0450    0.0900

But we can pick other parameters. For example

C = N*[0.3^2 0.7*0.3*1.3;0.7*0.3*1.3 1.3^2]*N'

C = 3×3

0.0900 0.2730 -0.3630 0.2730 1.6900 -1.9630 -0.3630 -1.9630 2.3260

mu = [0.2 0.3 0.5]; % must sum to 1

histogram(sum(mvnrnd(mu,C,1e5),2));

Geometrically, the ellipsoid is collapsing down to an ellipse centered on mu and lying in the plane x+y+z=1. There's probably even a further degenerate case where the ellipsoid collapses to a line going through my and lying in that plane.

Bruno Luong el 5 de En. de 2023

The problem is - as John pointed out - x, y, z ca get negative values. If that negative quantities has physical interpretation for fractional volume of human tissu then it's OK.

Paul el 5 de En. de 2023

As John did, I'm just addressing the question of how to specify mu and C such that mvnrnd(mu,C,n) returns values that sum to unity. Maybe that's not the correct question to ask for what the OP is really interested in.

Iniciar sesión para comentar.

Answer 2

Torsten el 2 de En. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1#answer_1139802

You can define a common probability distribution of the three variables on the triangle

x + y + z = 1, x, y, z >= 0

but this cannot be a normal or lognormal distribution for each of the random variables as you suggested.

You can also divide the result of individual sampling by R_vas + R_int + R_cell, but this will change the assumed distributions for R_vas, R_int and R_cell.

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Jesse Chao el 5 de En. de 2023

Hello Torsten,

Thank you for the suggestions and reminders.

Jesse

Iniciar sesión para comentar.

Answer 3

John D'Errico el 2 de En. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1#answer_1139807

Editada: John D'Errico el 2 de En. de 2023

Magic?

You have three variables, with means that will at least get you in the right ballpark. The goal however, its to insure the sum is exactly 1. What property does a lognormal variable have? You can generate one by generating a normally distributed random variable, and then exponentiating it. So effectively, the log of a lognormal variate, is Normally distributed.

Now, if your goal was to find three lognormally distributed variables, where the PRODUCT was 1, this problem would be far easier. You find a set of three normal variables where the sum was zero. When you exponentiate, the product is automatically 1. And finding a set of normal variates with a sum of zero is almost trivial. (No reason to get into that here, as it is not pertinent to your problem.)

Anyway, you are correct in that you CANNOT just sample two variates, then choose the third to force that sum to 1. But let me back up. Does this entire question even make mathematical sense? Sadly, no. Remember that a lognormal variable lives on the interval (0,inf). So there is a strong chance that any one of those variables themselves are greater than 1. And when that happens, the sum can NEVER be 1 because the other two terms in the sum can never be negative. So you cannot have a situation where the sum is constrained to be a constant. Sorry.

At best, you could think about a set of TRUNCATED lognormal variables, that sum to 1. And even then, the truncation point would be difficult to quantify.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 4

Bruno Luong el 2 de En. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/1887397-the-proper-way-to-sample-3-normally-or-lognormal-distributed-variables-added-up-to-1#answer_1139837

Editada: Bruno Luong el 2 de En. de 2023

Abrir en MATLAB Online

This will generate 3 random variables positives and sum to 1, with mean 8/10, 1/10, 1/10:

nu=[8 1 1];
B=arrayfun(@(n) ones(1,n),nu,'unif', 0);
A=blkdiag(B{:});
m=size(A,2);
p=3; % larger p will give smaller std
% this two lines will generate proper uniform conditioninf distributions,
% row-sum to 1 for all p columns
r=-log(rand(m,p)); 
r = sum(r./sum(r,1),2)/p;
%
r=A*r;
r
r = 3×1
    0.7922
    0.1397
    0.0680

It will NOT be normal, but it kind of "normal" in the sense of central limit theorem, i.e. a limit of sum of a "large" independent random variables.

2 comentarios
Mostrar NingunoOcultar Ninguno

Bruno Luong el 3 de En. de 2023

Editada: Bruno Luong el 3 de En. de 2023

Abrir en MATLAB Online

Here is the histogram of the results using my code. It will not Gaussian curve but skewed bell shape:

When you increase p, the histogram will be narrower but less skewed.

nu=[8 1 1];
B=arrayfun(@(n) ones(1,n),nu,'unif', 0);
A=blkdiag(B{:});
m=size(A,2);
p=3; % larger p will give smaller std
rtab = zeros(size(A,1),1e6);
for k=1:size(rtab,2)
    % this two lines will generate proper uniform conditioninf distributions,
    % row-sum to 1 for all p columns
    r=-log(rand(m,p));
    r = sum(r./sum(r,1),2)/p;
    %
    rtab(:,k)=A*r;
end
figure
subplot(2,2,1)
histogram(rtab(1,:))
subplot(2,2,3)
histogram(rtab(2,:))
subplot(2,2,4)
histogram(rtab(3,:))

Run with p = 20

Jesse Chao el 5 de En. de 2023

Hello Bruno,

Thank you for the solutions.

This somehow could solve the problem.

Jesse

Iniciar sesión para comentar.

The proper way to sample 3 normally or lognormal distributed variables added up to 1

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

8 comentarios
Mostrar 6 comentarios más antiguosOcultar 6 comentarios más antiguos

Más respuestas (3)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

The proper way to sample 3 normally or lognormal distributed variables added up to 1

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

8 comentarios Mostrar 6 comentarios más antiguosOcultar 6 comentarios más antiguos

Más respuestas (3)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios Mostrar NingunoOcultar Ninguno

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

8 comentarios
Mostrar 6 comentarios más antiguosOcultar 6 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno