Check if 2D points are evenly distributed

Question

0 votos

Hi,

I have a plane and a few data points (2 to 20) on that plane. Now I want to know how evenly they are distributed.

For example if I have 4 points and one is in every corner it would be perfect. If all 4 points are in the same spot it would be worst case.

I am not really interested in the perfect way to calculate this. I want to do this very often so my main focus is on speed.

I tried to find a solution to this problem but I think I don't know what I am even looking for. I hope someone can give me the right idea.

Best Regards

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

John D'Errico el 1 de Mzo. de 2015

Editada: John D'Errico el 1 de Mzo. de 2015

Abrir en MATLAB Online

4 votos

Your goal is a measure of the how evenly points are distributed in what I assume is a rectangular domain. The problem is, you need to explain your goals here. How do you define goodness?

Suppose we had two points that were perfectly coincident, but all other points in the rectangle were perfectly evenly distributed. Would this be worse than having 3 points that were all moderately close together, and the remainder of the points all again perfectly evenly distributed? Or perhaps you might have two separate pairs that are each close together, but not coincident?

In order to define some measure of even-ness, it seems that you need to explain your goals far more clearly. How do we know that something is bad?

My suggestion, having now heard more of what you want, is to compute the interpoint distance matrix between all points in the set. I've got a tool on the file exchange that does this (IPDM), but it is trivial to compute, especially for such a small set of points.

For example, given a set of points in the unit square, lets see what we can do.

N = 20;
% the first column here is x, the second column y.
P = rand(N,2); 
% D will be an NxN matrix.
D = sqrt(bsxfun(@minus,P(:,1),P(:,1).').^2 + bsxfun(@minus,P(:,2),P(:,2).').^2);
% compute the minimum interpoint distance, for
% each point. This is essentially the distance for
% each point to its nearest neighbor.
Dmin = min(D + realmax*eye(N));
Dmin
Dmin =
Columns 1 through 11
    0.20294      0.11087      0.29086      0.11087    0.0066339      0.10327      0.13755      0.26768      0.23623        0.244      0.25469
Columns 12 through 20
    0.10327    0.0066339      0.10841      0.13755       0.2456      0.13776      0.20294      0.10841      0.11378

As you can see, the minimum distance will in general appear twice in that list, because point 5 is closest to point 13, and point 13 is closest to point 5.

Alternatively, if you have the stats toolbox, you can use the function pdist. It too gives us the set of interpoint distances, although it returns only what is essentially the upper triangle of that distance matrix.

min(pdist(P))
ans =
    0.0066339

One idea you might consider is to compare that overall minimum distance to the some measure of a best case possible. For this, think about the idea of circle packing in a rectangular region. While I could probably do some online research on the best case circle packing for N circles in a unit square, a simple measure is given by a ratio of areas.

% area of the rectangle, length*height
Arect = 1*1;
% for N circles, we assign each circle 1/N of that area,
% therefore compute an effective radius, IF we could pack
% N circles perfectly with no overlap.
Crad = sqrt(Arect/N/pi)
Crad =
      0.12616

So if we could have N perfectly evenly distributed points (here, N=20 over a unit square) then the minimum interpoint distance could at best be roughly TWICE that radius. (This argument has some flaws in it, which are worst for a small number of points, where we would want to consider the optimum circle packing in some domain. I could spend some time and improve that estimate, were it of interest. This would take only a wee bit of algebra.)

The point is, we can now compare the minimum interpoint distance to that value. A truly even sampling of points will have no pair of points that are significantly closer together than twice Crad. As you can see, the random sampling I generated above is flawed in that respect, since

min(Dmin)/(2*Crad)
ans =
    0.026292

is quite small. This is actually to be expected from the "uniform" random sampling, for anyone who has seen the birthday paradox in action. In fact, the closest point from such a random sampling will be much smaller than that predicted for an "even" sampling as I am choosing to define it.

As I said, if it is of interest, I can give you a better value to compare against.

Edit: I decided that I had some time to do this right.

Given a rectangle of length L, height H, with N points to be distributed around the rectangle. We want to find a measure of the best possible circle packing inside that rectangle, where the center of any circle can go as far as a corner of the rectangle, but no circle is allowed to overlap. Assume the circles have radius r.

syms N L H r
rmax = solve((L+2*r)*(H + 2*r) - N*(2*r)^2)
rmax =
  (H + L - (H^2 + L^2 - 2*H*L + 4*H*L*N)^(1/2))/(4*(N - 1))
  (H + L + (H^2 + L^2 - 2*H*L + 4*H*L*N)^(½))/(4*(N - 1))

I'll let the reader figure out how the expression I solve for arises. It seems clear to me. :)

As it turns out, only the second of those solutions is positive. I'll substitute 20 for N, and do it over the unit square, so L=H=1 here.

rmax = rmax(2);
subs(rmax,{N,L,H},[20 1 1])
ans =
  80^(1/2)/76 + 1/38
vpa(ans)
ans =
  0.14400357776314682612679861414375

So twice that radius would correspond to a maximally evenly packed set of 20 points over the unit square.

There are still some flaws with the above argument, wherein a truly optimal circle packing for N points would be a somewhat better result, but all we want is a basic measure to compare against. So it should be entirely adequate, although it will be only approximate for very small sets of points, like 2. So in fact, for two points in the unit square, the best packing would allow circles of radius sqrt(2)/2 = 0.7071...

subs(rmax,{N,L,H},[2 1 1])
ans =
  8^(1/2)/4 + 1/2
vpa(ans)
ans =
  1.2071067811865475244008443621048

It is only a simple measure as I said, and we need not have it be perfect.

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Patrick el 2 de Mzo. de 2015

Thanks a lot for this detailed answer. I suppose I will go with this one for now. I already tried something like this but as you pointed out we need something to compare against which I was missing. Now I have that.

Iniciar sesión para comentar.

Answer 2

Image Analyst el 1 de Mzo. de 2015

4 votos

Professor Adrian Baddeley is a leading researcher in spatial statistics and a Science Fellow with CSIRO and University of Western Australia. He wrote the bible on spatial statistics. He gives a workshop on "Analysing spatial point patterns in R". His book and papers may still be available online.

In his papers you'll see that a pattern or points can go from regular and periodic (like an array or grid), to random and uniform (Poisson distribution like raindrops on the ground), to clustered and clumped.

And of course there are states in between. The degree to which the pattern matches one of those classifications can be measured, and he gives algorithms for doing that. For example he says "One simple diagnostic for dependence between points is a Morishita plot. The spatial domain is divided into quadrats, and the chi square statistic based on quadrat counts is computed. The quadrats are repeatedly subdivided. The Morishita plot shows the chi square statistic against the linear size of the quadrats."

He uses real world cases like locations of trees and ants. You can even have multiple populations like 4 species of trees and 2 species of ants. Besides answering questions like how uniformly distributed are the individuals, you can also answer questions like is tree species 1 preferentially near tree species 2? Or are black ant hills preferentially near red ant hills? If you're interested in spatial statistics, I suggest you look up his books and papers.

4 comentarios
Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

Patrick el 2 de Mzo. de 2015

Editada: Patrick el 2 de Mzo. de 2015

I agree, this is very interesting. But I am not sure yet I want to put that much work in this project. But it is probably the best solution.

Edit: Seems like I can only select one as an answer. I wanted to mark this one too...

Joe el 25 de Jun. de 2015

This paper was very useful and shows the equations to use to figure out how evenly points are distributed.

Paper titled: "Clustering, Randomness, and Regularity: Spatial Distributions and Human Performance on the Traveling Salesperson Problem and Minimum Spanning Tree Problem" from Purdue

Iniciar sesión para comentar.

Answer 3

Star Strider el 28 de Feb. de 2015

Abrir en MATLAB Online

2 votos

I don’t know how involved you want to get. I’m not aware of a statistical test for ‘evenly distributed’, so I created one that might work for you. It looks at the row and column distributions of the points on the plot, does a linear regression on that and does a simple statistical test (confidence intervals) on the slope. If the slope is not statistically different from zero (that is, it is not needed in the regression, so the confidence limits include zero), you can assume the row and column indices are essentially ‘evenly distributed’. (I used the normal distribution 95% confidence intervals, ±1.96. A t-distribution would be more accurate.)

Experiment with this to fit your application, interpret the results as you wish:

N = 9;                                      % Number Of Points
P = randi(N, 20, 2);                        % Create Points
bins = [1:N];                               % Bins For ‘histc’
den = [ones(N,1) bins'];                    % Regression Denominator 
XTX = den'*den;                             % CovB Denominator
Kr = histc(P(:,1), bins);                   % Row Counts
Br = den\Kr;                                % Linear Regression On Row Counts
Kc = histc(P(:,2), bins);                   % Column Counts
Bc = den\Kc;                                % Linear Regression On Column Counts
df = N-2;                                   % Degrees-Of-Freedom: Points-Parameters
CovBr = var(Kr-den*Br)./XTX;                % Covariance Matrix For ‘Br’
CovBc = var(Kc-den*Bc)./XTX;                % Covariance Matrix For ‘Bc’
MrCI = [-1.96 1.96]*CovBr(1,1) + Br(2);     % Row-Slope Confidence Intervals 
McCI = [-1.96 1.96]*CovBc(1,1) + Bc(2);     % Column-Slope Confidence Intervals
figure(1)
plot(P(:,1), P(:,2), 'bp')
grid
axis([0  10    0  10])

6 comentarios
Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

John D'Errico el 1 de Mzo. de 2015

Abrir en MATLAB Online

Actually, it has been a while, but I recall there is a statistical test for an "even distribution" of points. It is called the " Kolmogorov-Smirnov test " You would use that test to compare the sampling of your data points to a uniform distribution.

Why use it? For example, the idea that Star has proposed will fail for a simple u-shaped non-uniformity pattern. So, leave all of the bins of a histogram unchanged, but simply add some extra points to the first and last bins of the histogram. Or put relatively fewer points in the first and last bins of that same histogram.

There are other schemes that will cause the linear regression test Star proposed to fail. Star looks only at the marginals in x and in y, creating what are called Marginal distributions in x then in y of your 2-d sampling. The problem here is that there are many ways such a test can fail, since it is quite trivial to provide a very non-uniform sampling that has perfectly uniform marginal distributions.

A simple way for that to fail is if your points all lie along the line y=x in the (x,y) plane, but are uniformly distributed along that line. So if you change the code that Star has proposed in one line...

P = repmat(randi(N, 20, 1),1,2);

Now the points lie perfectly on the line y=x. They are uniform on that line, but are very non-uniform in the plane. However, the code that Star gave you will tell you the points were quite evenly distributed, making exactly the wrong conclusion.

As I suggested, a K-S test would tell you differently. There are several ways one could do such a test. I might suggest creating a 2-d histogram of points, then stringing it out into a one dimensional set of bins. This will catch if any place in the 2-d sampling of points is under-represented.

But be careful though, as if you have relatively few points compared to the sampling frequency, then histogram based techniques will again fail.

Patrick el 1 de Mzo. de 2015

Hi & thanks

I didn't work through your answers yet. Just wanted to add the missing information:

The data points are "measured in real world". They are points found in a picture so I have a x and y value for each point. The points can't be exactly in the same spot but they can be extremely close to each other (like 0.1 pixels close). Sorry I should have added this in my post.

Star Strider el 1 de Mzo. de 2015

OK. I’ll add ‘image processing’ to the tags to see if Image Analyst has any thoughts.

Iniciar sesión para comentar.

Check if 2D points are evenly distributed

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Más respuestas (2)

4 comentarios
Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

6 comentarios
Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

Categorías

Productos

Etiquetas

Community Treasure Hunt

Check if 2D points are evenly distributed

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

Más respuestas (2)

4 comentarios Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

6 comentarios Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos

Categorías

Productos

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguos Ocultar -1 comentarios más antiguos

4 comentarios
Mostrar 2 comentarios más antiguos Ocultar 2 comentarios más antiguos

6 comentarios
Mostrar 4 comentarios más antiguos Ocultar 4 comentarios más antiguos