Check if 2D points are evenly distributed
Mostrar comentarios más antiguos
Hi,
I have a plane and a few data points (2 to 20) on that plane. Now I want to know how evenly they are distributed.
For example if I have 4 points and one is in every corner it would be perfect. If all 4 points are in the same spot it would be worst case.
I am not really interested in the perfect way to calculate this. I want to do this very often so my main focus is on speed.
I tried to find a solution to this problem but I think I don't know what I am even looking for. I hope someone can give me the right idea.
Best Regards
Respuesta aceptada
Más respuestas (2)
Image Analyst
el 1 de Mzo. de 2015
4 votos
Professor Adrian Baddeley is a leading researcher in spatial statistics and a Science Fellow with CSIRO and University of Western Australia. He wrote the bible on spatial statistics. He gives a workshop on "Analysing spatial point patterns in R". His book and papers may still be available online.
In his papers you'll see that a pattern or points can go from regular and periodic (like an array or grid), to random and uniform (Poisson distribution like raindrops on the ground), to clustered and clumped.

And of course there are states in between. The degree to which the pattern matches one of those classifications can be measured, and he gives algorithms for doing that. For example he says "One simple diagnostic for dependence between points is a Morishita plot. The spatial domain is divided into quadrats, and the chi square statistic based on quadrat counts is computed. The quadrats are repeatedly subdivided. The Morishita plot shows the chi square statistic against the linear size of the quadrats."
He uses real world cases like locations of trees and ants. You can even have multiple populations like 4 species of trees and 2 species of ants. Besides answering questions like how uniformly distributed are the individuals, you can also answer questions like is tree species 1 preferentially near tree species 2? Or are black ant hills preferentially near red ant hills? If you're interested in spatial statistics, I suggest you look up his books and papers.
4 comentarios
Star Strider
el 1 de Mzo. de 2015
WOW! Thank you Image Analyst!
I’ll look for Prof. Baddeley’s works, since they would seem to be spot on with respect to analysing photomicrographs of cells in peripheral blood smears and tissue sections, the relative numbers and distributions of which are usually important.
This sounds interesting from a theoretical perspective as well. I’d like to learn the maths behind the statistics.
David Young
el 1 de Mzo. de 2015
Yes, extremely interesting.
Joe
el 25 de Jun. de 2015
This paper was very useful and shows the equations to use to figure out how evenly points are distributed.
Paper titled: "Clustering, Randomness, and Regularity: Spatial Distributions and Human Performance on the Traveling Salesperson Problem and Minimum Spanning Tree Problem" from Purdue
Star Strider
el 28 de Feb. de 2015
I don’t know how involved you want to get. I’m not aware of a statistical test for ‘evenly distributed’, so I created one that might work for you. It looks at the row and column distributions of the points on the plot, does a linear regression on that and does a simple statistical test (confidence intervals) on the slope. If the slope is not statistically different from zero (that is, it is not needed in the regression, so the confidence limits include zero), you can assume the row and column indices are essentially ‘evenly distributed’. (I used the normal distribution 95% confidence intervals, ±1.96. A t-distribution would be more accurate.)
Experiment with this to fit your application, interpret the results as you wish:
N = 9; % Number Of Points
P = randi(N, 20, 2); % Create Points
bins = [1:N]; % Bins For ‘histc’
den = [ones(N,1) bins']; % Regression Denominator
XTX = den'*den; % CovB Denominator
Kr = histc(P(:,1), bins); % Row Counts
Br = den\Kr; % Linear Regression On Row Counts
Kc = histc(P(:,2), bins); % Column Counts
Bc = den\Kc; % Linear Regression On Column Counts
df = N-2; % Degrees-Of-Freedom: Points-Parameters
CovBr = var(Kr-den*Br)./XTX; % Covariance Matrix For ‘Br’
CovBc = var(Kc-den*Bc)./XTX; % Covariance Matrix For ‘Bc’
MrCI = [-1.96 1.96]*CovBr(1,1) + Br(2); % Row-Slope Confidence Intervals
McCI = [-1.96 1.96]*CovBc(1,1) + Bc(2); % Column-Slope Confidence Intervals
figure(1)
plot(P(:,1), P(:,2), 'bp')
grid
axis([0 10 0 10])
6 comentarios
John D'Errico
el 1 de Mzo. de 2015
Actually, it has been a while, but I recall there is a statistical test for an "even distribution" of points. It is called the " Kolmogorov-Smirnov test " You would use that test to compare the sampling of your data points to a uniform distribution.
Why use it? For example, the idea that Star has proposed will fail for a simple u-shaped non-uniformity pattern. So, leave all of the bins of a histogram unchanged, but simply add some extra points to the first and last bins of the histogram. Or put relatively fewer points in the first and last bins of that same histogram.
There are other schemes that will cause the linear regression test Star proposed to fail. Star looks only at the marginals in x and in y, creating what are called Marginal distributions in x then in y of your 2-d sampling. The problem here is that there are many ways such a test can fail, since it is quite trivial to provide a very non-uniform sampling that has perfectly uniform marginal distributions.
A simple way for that to fail is if your points all lie along the line y=x in the (x,y) plane, but are uniformly distributed along that line. So if you change the code that Star has proposed in one line...
P = repmat(randi(N, 20, 1),1,2);
Now the points lie perfectly on the line y=x. They are uniform on that line, but are very non-uniform in the plane. However, the code that Star gave you will tell you the points were quite evenly distributed, making exactly the wrong conclusion.
As I suggested, a K-S test would tell you differently. There are several ways one could do such a test. I might suggest creating a 2-d histogram of points, then stringing it out into a one dimensional set of bins. This will catch if any place in the 2-d sampling of points is under-represented.
But be careful though, as if you have relatively few points compared to the sampling frequency, then histogram based techniques will again fail.
Star Strider
el 1 de Mzo. de 2015
I considered the K-S test, but since the data in my example are generated by a uniform distribution by design, I didn’t see that using it would add any information.
However, I don’t know how Patrick’s data are generated, so the K-S test on a uniform distribution could definitely be useful if they are empiric, and if there were a sufficient number of them.
John D'Errico
el 1 de Mzo. de 2015
I think it is crucial to know more about the points themselves, and what characteristics they might have, as well as the goals for this task. Roughly how many points would there be? Are the points nominally randomly sampled, and the goal is to test for uniformity? Or if this is a problem of parameter estimation, then this might be a design of experiments issue.
Without knowing more, I think it is too difficult to give a truly useful answer.
Star Strider
el 1 de Mzo. de 2015
I agree. The original Question mentions ‘(2 to 20)’ data points, so I used 20 in my simulation. I know nothing more about them, other than that there is a possibility that they could all end up in one corner of the plot. From that, I believe we can infer they’re empirical, but we know nothing else about the process that created them (and that could help provide a definitive answer).
Patrick
el 1 de Mzo. de 2015
Star Strider
el 1 de Mzo. de 2015
OK. I’ll add ‘image processing’ to the tags to see if Image Analyst has any thoughts.
Categorías
Más información sobre Uniform Distribution (Continuous) en Centro de ayuda y File Exchange.
Productos
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!