The proper way to sample 3 normally or lognormal distributed variables added up to 1
3 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Jesse Chao
el 2 de En. de 2023
Comentada: Paul
el 5 de En. de 2023
Hello team,
Is there any way to generate 3 normally or lognormal distributed variables added up to 1?
For example, I have a human tissue with a volume as 1 L. And there are three components vascular space (R_vas), interstitial space (R_int), cellular space (R_cell) to composite this tissue. Thus, in this case, R_vas + R_int + R_cell = 1.
To further consider variation in different human indivisual, I would like to normally or lognormally sample these three parameters (R_vas, R_int and R_cell have a mean 0.1, 0.1, 0.8, and coefficient variation 0.3, 0.3, 0.3, respectively) but ensure that these three parameters add up to 1.
I don't think I could just randomly sample two of the parameters and use 1 to minus the sum of the two randomly smapled parameters to get the third one. In this case, I think the third parameter is not followed the predefined mean and coefficient variation. or is it?
How could I achieve this task?
It will be great to have your help.
0 comentarios
Respuesta aceptada
John D'Errico
el 2 de En. de 2023
Editada: John D'Errico
el 2 de En. de 2023
Let me answer your second question separately. How would you sample three NORMALLY distributed random variables that sum to 1? This part is easy, since it merely requires you to properly construct the necessary covariance matrix.
First, assume we have three variables, with means and standard deviations:
mu = [.1 .1 .8];
sigma = [.3 .3 .3];
But really, we need a covariance matrix that is not a diagonal one. I'll construct it as if I knew the eigenvalue decomposition.
Q = [null([1 1 1]),[1;1;1]/sqrt(3)]
C = Q*diag(3/2*[.3^2 .3^2 0])*Q'
As you can see, the VARIANCES (so the sqrt of the diagonal elements) of the covariance matrix are each 0.3. But they are now correlated.
Now we can generate set of numbers that have the desired property, at least over the long term, and within floating point trash. What I did was to carefully construct a singular covariance matrix.
X = mvnrnd(mu,C,10000);
hist(sum(X,2))
mean(X)
std(X)
format long g
[min(sum(X,2)),max(sum(X,2))]
So to within floating point trash, they sum to 1. At least as close as we can come based on double precision arithmetic in what I did. They have the desired means and variances.
Note that these variables have a problem perhaps in what you are doing, in that some of those elements will often be negative.
hist(min(X,[],2))
That is a likely event when the sum is required to be 1, and you are asking about NORMALly distributed random variables. This is also a reason why your goal MUST fail for the lognormal case.
8 comentarios
Bruno Luong
el 5 de En. de 2023
The problem is - as John pointed out - x, y, z ca get negative values. If that negative quantities has physical interpretation for fractional volume of human tissu then it's OK.
Paul
el 5 de En. de 2023
As John did, I'm just addressing the question of how to specify mu and C such that mvnrnd(mu,C,n) returns values that sum to unity. Maybe that's not the correct question to ask for what the OP is really interested in.
Más respuestas (3)
Torsten
el 2 de En. de 2023
You can define a common probability distribution of the three variables on the triangle
x + y + z = 1, x, y, z >= 0
but this cannot be a normal or lognormal distribution for each of the random variables as you suggested.
You can also divide the result of individual sampling by R_vas + R_int + R_cell, but this will change the assumed distributions for R_vas, R_int and R_cell.
John D'Errico
el 2 de En. de 2023
Editada: John D'Errico
el 2 de En. de 2023
Magic?
You have three variables, with means that will at least get you in the right ballpark. The goal however, its to insure the sum is exactly 1. What property does a lognormal variable have? You can generate one by generating a normally distributed random variable, and then exponentiating it. So effectively, the log of a lognormal variate, is Normally distributed.
Now, if your goal was to find three lognormally distributed variables, where the PRODUCT was 1, this problem would be far easier. You find a set of three normal variables where the sum was zero. When you exponentiate, the product is automatically 1. And finding a set of normal variates with a sum of zero is almost trivial. (No reason to get into that here, as it is not pertinent to your problem.)
Anyway, you are correct in that you CANNOT just sample two variates, then choose the third to force that sum to 1. But let me back up. Does this entire question even make mathematical sense? Sadly, no. Remember that a lognormal variable lives on the interval (0,inf). So there is a strong chance that any one of those variables themselves are greater than 1. And when that happens, the sum can NEVER be 1 because the other two terms in the sum can never be negative. So you cannot have a situation where the sum is constrained to be a constant. Sorry.
At best, you could think about a set of TRUNCATED lognormal variables, that sum to 1. And even then, the truncation point would be difficult to quantify.
0 comentarios
Bruno Luong
el 2 de En. de 2023
Editada: Bruno Luong
el 2 de En. de 2023
This will generate 3 random variables positives and sum to 1, with mean 8/10, 1/10, 1/10:
nu=[8 1 1];
B=arrayfun(@(n) ones(1,n),nu,'unif', 0);
A=blkdiag(B{:});
m=size(A,2);
p=3; % larger p will give smaller std
% this two lines will generate proper uniform conditioninf distributions,
% row-sum to 1 for all p columns
r=-log(rand(m,p));
r = sum(r./sum(r,1),2)/p;
%
r=A*r;
r
It will NOT be normal, but it kind of "normal" in the sense of central limit theorem, i.e. a limit of sum of a "large" independent random variables.
2 comentarios
Bruno Luong
el 3 de En. de 2023
Editada: Bruno Luong
el 3 de En. de 2023
Here is the histogram of the results using my code. It will not Gaussian curve but skewed bell shape:
When you increase p, the histogram will be narrower but less skewed.
nu=[8 1 1];
B=arrayfun(@(n) ones(1,n),nu,'unif', 0);
A=blkdiag(B{:});
m=size(A,2);
p=3; % larger p will give smaller std
rtab = zeros(size(A,1),1e6);
for k=1:size(rtab,2)
% this two lines will generate proper uniform conditioninf distributions,
% row-sum to 1 for all p columns
r=-log(rand(m,p));
r = sum(r./sum(r,1),2)/p;
%
rtab(:,k)=A*r;
end
figure
subplot(2,2,1)
histogram(rtab(1,:))
subplot(2,2,3)
histogram(rtab(2,:))
subplot(2,2,4)
histogram(rtab(3,:))
Run with p = 20
Ver también
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!