Randomly dividing an integer (non-uniform distribution)

47 views (last 30 days)
Sordin on 11 May 2019
Answered: Kaan Çamur on 23 Nov 2020
I am trying to write a code that randomly divides a number N into M parts (in other words, I want to generate M random numbers whose sum adds up to N). However, I want to allow the generated numbers to vary considerably in magnitude.
My current code generates numbers that are all in the same ballpark:
N = 1;
n = rand(1,M);
TOT = sum(n);
numbers = (n/TOT)*N
For instance, for , it yields:
0.0716 0.0184 0.0498 0.0204 0.0748 0.0528 0.0120 0.0194 0.0650 0.0721 0.0459 0.0744 0.0785 0.0248
0.0414 0.0326 0.0786 0.0778 0.0608 0.0290
As one can see, the number is so evenly distributed that all generated numbers have the same (or close) orders of magnitude.
Is there a simple way to tweak this code so that it can generate numbers that are widely different in magnitude (e.g. 0.7 and 0.01)?
Any suggestions would be greatly appreciated.

Jeff Miller on 12 May 2019
There are any number of ways to do that, depending on exactly what you mean by "considerably". Here are a few:
%% Transform the numbers so they aren't uniform to begin with.
M=20;
N = 1;
n = rand(1,M).^10; % Increase the exponent to increase the size variation. A lot of numbers will be near zero.
TOT = sum(n);
numbers = (n/TOT)*N
%% Generate progressively smaller numbers by generating each new one as a uniform(0,1) fraction of
% what is left of your total.
M=20;
n = rand(1,M);
TOT = sum(n);
cumtot = cumsum(n(1:end-1));
n(1:end-1) = n(1:end-1) .* (TOT - cumtot);
TOT = sum(n);
numbers = (n/TOT)*N
% permute(numbers) here if you don't want larger numbers to be systematically first.

John D'Errico on 12 May 2019
Edited: John D'Errico on 12 May 2019
The problem with the alternative answers given, is they tend to have a flaw, in that they do not sample the space of interest well. It is the same reason why those schemes fail, when you try to sample uniformly, but with a fixed sum. The problem is though, the goal is a difficult one.
First, rather than saying we want to just generically say we want to randomly sample, first, we need to indicate a distribution. My logical initial choice would be an exponential distribution. Thus, can we sample points from an exponential random variable, such that the sum of the set is some fixed value? In fact, even this seems difficult, because an exponential random variable can yield samples that are infinitely large. But, suppose we used the classic (and arguably problematic) approach?
For example, I want to generate a set of M values, such that the sum is N, but the distribution is something based on an exponential random variable?
This means we want to find a random set that lies within a planar region in M dimensions, but is also exponentially distributed? Ugh. So let me try an obvious solution, and see what happens.
% number of samples of this process
numsamples = 10000;
% numdim == M.
% I've picked M=2 because I will want to plot things.
% I'll redo it in a higher number of dimensions afterwards.
numdim = 2;
% it seems not to matter what the exponential rate parameter
% is, because an exponential generates samples that go out to
% infinity. So I'll arbitrarily pick lambda == 1. I'm going
% to rescale them anyway. (lambda is the inverse of the mean
% of the exponential.)
lambda = 1;
% we can just use exprnd to generate the samples. But
% some people do not have the stats toolbox, and an
% exponential is trivial to work with.
% https://en.wikipedia.org/wiki/Exponential_distribution
% On that page, we see that an exponential with rate
% parameter lambda has mean 1/lambda. (Note that the
% stats TB function exprnd uses parameter mu, the mean
% as the exponential parameter.)
X = -log(rand(numsamples,numdim))/lambda;
% So, initially, look at the distribution of the columns of X.
% done BEFORE re-scaling to have a unit sum.
hist(X(:,1),100) As you should see, the columns of X are (initially) exponentially distributed.
% Now just rescale them to have the desired sum.
% Note that this next line uses an ability introduced
% into MATLAB with R2016b.
X = X./sum(X,2);
% plot a histogram of X(:,1), post-rescaling.
hist(X(:,1),100) Which looks surprsingly uniformly distributed! (Something I would need to think about.)
My point in all of this, is you need to carefully think about the implications of the re-scaling operation, and what it does to the distribution of the numbers as generated.
Perhaps a better choice might be to use a beta random variable initially. So, in 3-dimensions, with M==3...
X = betarnd(.25,.25,[numsamples,3]);
X = X./sum(X,2);
hist(X(:,1),100) So there are many ssamples with very small values of each variable, as well as a spike near the top end of the histogram too.
Plotting now in 3-dimensions, we see:
plot3(X(:,1),X(:,2),X(:,3),'.')
box on
grid on Which is an interesting plot in itself. Again though, the re-sclaing operation does strange, and sometimes nasty things to what you should expect.
pedro on 18 May 2019
The joys of broken sticks...
John wrote (abbreviated)
X = X./sum(X,2);
hist(X(:,1),100)
Which looks surprisingly uniformly distributed! (Something I would need to think about.)
I agree. It does. (I did.)
>> Y = X./sum(X,1);
>> hist(Y(:,1),100)
John, if you're reading this (and even if you're not), please let me take this opportunity to thank you very very much indeed for your very many contributions to these discussions, which have educated and entertained me beyond measure.
cheers,
pedro

Erivelton Gualter on 11 May 2019
How about you generate n variable from a desired range. For example:
n = (b-a)*rand(1,M)+a;
So you have,
% Range
a = 0.01;
b = .7;
N = 1;
M = 20;
n = (b-a)*rand(1,M)+a; % 1xM random numbers from: [a, b]
TOT = sum(n);
numbers = (n/TOT)*N;

Kaan Çamur on 23 Nov 2020
In case of anyone needs an online tool for a quick solution, try this -> https://functioncube.com/Functions/Divide-Number

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!