Generate random numbers given distribution/histogram

MATLAB provides built-in functions to generate random numbers with an uniform or Gaussian (normal) distribution. My question is: if I have a discrete distribution or histogram, how can I can generate random numbers that have such a distribution (if the population (numbers I generate) is large enough)?
Please post here if anyone knows of a good method of doing this.
Thanks, David

 Respuesta aceptada

Since nobody has any suggestions, here's one. If you have a discrete distribution, say it is a Nx2 matrix PD, first column the discrete values, second the probabilities of the corresponding value -- so sum(PD(:,2))==1.
Then map the probablities to the unit interval and use rand. What mean by that:
% Those are your values and the corr. probabilities:
PD =[
1.0000 0.1000
2.0000 0.3000
3.0000 0.4000
4.0000 0.2000];
% Then make it into a cumulative distribution
D = cumsum(PD(:,2));
% D = [0.1000 0.4000 0.8000 1.0000]'
Now for every r generated by rand, if it is between D(i) and D(i+1), then it corresponds to an outcome PD(1,i+1), with the obvious extension at i==0. Here's a way you could do that, even though I'm sure there are better ones:
R = rand(100,1); % Your trials
p = @(r) find(r<pd,1,'first'); % find the 1st index s.t. r<D(i);
% Now this are your results of the random trials
rR = arrayfun(p,R);
% Check whether the distribution looks right:
hist(rR,1:4)
% It does, roughly 10% are 1, 30% are 2 and so on
If you want more help you should post a minimal example of the form in which you have the discrete distribution.

5 comentarios

David C
David C el 3 de En. de 2013
Thanks.
It is more efficient to use 'histc' rather than 'find'. For input arguments to 'histc' the 'rand' values would be the first argument and the 'cumsum' values the 2nd ('edges') argument. The second argument of the output will then have pointers to the corresponding discrete values as the desired generated random numbers.
Eric Auld
Eric Auld el 24 de En. de 2018
Editada: Eric Auld el 24 de En. de 2018
Hi Roger, I'm wondering if the new 'histcounts' is just different than 'histc', or if I'm not understanding. According to the documentation, and to my experiments, the second output of histcounts will be the same as the second input, when the second input is an array. Thanks
one change is required:
p = @(r) find(r<d,1,'first'); % find the 1st index s.t. r<D(i);
Overall code may be written (with some slight modifications) as:
dx = 0.001;
x = -100 : dx : 100; % limits depending on the random variable definition.
% For example, (dx : dx : 100) for Rayleigh (0 not recommended).
f = PDF(x);
F = cumsum(f) * dx; % CDF
F = F / F(end); % recommended when max(F) is close to 1. Otherwise increase x - points to achieve F(end) - max value - close to 1.
N = 10000;
U = rand(1, N);
pdf2rand = @(u) x(find(u <= F, 1, 'first'));
X = arrayfun(pdf2rand, U);
X % desired random variable points
function f = PDF(x)
% Probability Distribution Function
...
end

Iniciar sesión para comentar.

Más respuestas (2)

Image Analyst
Image Analyst el 28 de Oct. de 2012
I didn't notice your question or I would have answered, especially since it's asked so often. Try RANDRAW ( http://www.mathworks.com/matlabcentral/fileexchange/7309-randraw) for a list of common distributions. Or to refresh yourself on the theory of why using the CDF works, see Wikipedia: http://en.wikipedia.org/wiki/Inverse_transform_sampling
Theron FARRELL
Theron FARRELL el 30 de Abr. de 2019
Editada: Theron FARRELL el 30 de Abr. de 2019
Hi there,
I use this naive function to generate artificial outliers applied in machine learning. Hope that it will be a bit help in your case.
function [Out_Data, Out_PDF, CHist] = Complement_PDF(Hist, Data_Num, p)
% Generate a 1D vector of data with a PDF specified as the complementary PDF of input historgram. Note that the larger
% Data_Num is, the more Out_PDF will resemble to CHist
% Input
% Hist: PDF/Histogram of data
% Data_Num: Desired number of data to be generated
% p: Precision given by number of digits after 0
% Output
% Out_Data: Generated data as per the complementary PDF
% Out_PDF: The complementary PDF as per Out_Data
% CHist: The complementary PDF as per Hist
% Example
% Hist = [1, 6, 7, 100, 0, 0, 0, 2, 3, 5];
% Data_Number = 100000;
% p = 3
Hist = Hist/sum(Hist);
CHist = 1- Hist;
CHist = CHist/sum(CHist);
CDF_CHist = cumsum(CHist);
CDF_CHist = double(int32(CDF_CHist*10^p))/10^p;
Out_Data = zeros(1, Data_Num);
Out_PDF = zeros(1, length(CDF_CHist));
for i = 1:Data_Num
% Generate a uniformly distributed variable
x = double(int32(rand*10^p))/10^p;
% Inversely index CDF
Out_Data(i) = Inverse_CDF(x, CDF_CHist);
temp = floor(Out_Data(i) * length(CDF_CHist));
Out_PDF(temp) = Out_PDF(temp) + 1;
end
figure;
subplot 221, bar(Hist);
subplot 222, bar(CHist);
subplot 223, plot(CDF_CHist);
subplot 224, bar(Out_PDF);
end
function [y] = Inverse_CDF(x, CDF_CHist)
CDF_CHist_Ext = [0, CDF_CHist];
y = 1;
for ind = 1:length(CDF_CHist)
if (x >= CDF_CHist_Ext(ind)) && (x < CDF_CHist_Ext(ind+1))
y = ind/length(CDF_CHist);
break;
end
end
end

Categorías

Más información sobre Random Number Generation en Centro de ayuda y File Exchange.

Preguntada:

el 25 de Oct. de 2012

Comentada:

el 25 de Mayo de 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by