quantifying the similarity between data sets

Question

0 votos

data_sets.mat

Hi, I implemented an algorithm that tracks a particle in space and time. I applied it to two experiments and I got two data sets A=[X,Y] and B=[X,Y] of 8399 coordinate points each. The experiments were exactly the same. I ploted A and B and there are clear differences between them but overall, the points are within similar limits. Of course, they are never going to be exactly the same due to errors in the tracking algorithm. Still, given a certain criteria, Is there any method that quantify the difference between data sets in which I can say "ok, they are close enough" or "no, they are too much difference between them"?

Ps. I attached the data set I am currently analysing. Thank you

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

Image Analyst el 14 de Jul. de 2017

0 votos

See https://www.mathworks.com/products/computer-vision/features.html#3-d-point-cloud-processing

2 comentarios
Mostrar Ninguno Ocultar Ninguno

Daniel Mella el 16 de Jul. de 2017

Thanks for your answer.

I tried it but it is not what I am looking for. I need a way to quantify how similar or different my plots are.

I have been thinking on applying FFT to A and B using the pwelch function and then calculate the cross correlation between spectras. I think that will give me the similarity in X and Y.

Image Analyst el 16 de Jul. de 2017

Methods like sift and surf first identify a bunch of "salient points" and then they use point matching algorithms to find subsets of points that seem to align fairly well. If you don't like the ones in the Computer Vision System Toolbox, you can use some other one: https://www.google.com/#q=point+matching+algorithm

Or look into how "optical flow" (also in the CVSToolbox) works.

Iniciar sesión para comentar.

Answer 2

Star Strider el 16 de Jul. de 2017

Abrir en MATLAB Online

0 votos

I can’t find anything online that address your problem, and there may be no consensus. Some exploration of your data reveals that the x-coordinates in both are (essentially) identically-distributed, and the y-coordinates in both are (essentially) identically distributed. The x- and y-coordinates have different distributions, and none of them are normally distributed.

One approach therefore could be to do a Wilcoxon Rank Sum or Mann-Whitney U test separately on the x-coordinates of the two data sets and the y-coordinates of the two data sets. This tests the null hypothesis that the medians are the same, against the alternate hypothesis that they are different.

AB = load('data_sets.mat');
A = AB.A;
B = AB.B;
[p1,h1,stats1] = ranksum(A(:,1),B(:,1));
[p2,h2,stats2] = ranksum(A(:,2),B(:,2));

These results indicate that the medians are not different with respect to both the x- and y-coordinates.

To demonstrate that the distributions of the x- and y-coordinates are not different would require a different test, such as a chi-square goodness-of-fit test of one x-coordinate distribution against the other, and similarly for the y-coordinates. (Use histogram or histcounts to generate the distributions.) You would have to write that code yourself, and then use the appropriate chi squared distribution function to calculate the p-values based on your calculated chi-square statistics and degrees-of-freedom.

Since a definitive discussion on this does not seem to exist, or at least has evaded my search for it, this is the best I can come up with.

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Star Strider el 17 de Jul. de 2017

Abrir en MATLAB Online

My pleasure,

I experimented with the chi-square idea in the interim:

Xedges = linspace(min([A(:,1);B(:,1)]),max([A(:,1);B(:,1)]), 20);
Yedges = linspace(min([A(:,2);B(:,2)]),max([A(:,2);B(:,2)]), 20);
[HXA,edgesx] = histcounts(A(:,1),Xedges);
[HXB,edgesx] = histcounts(B(:,1),Xedges);
[HYA,edgesy] = histcounts(A(:,2),Yedges);
[HYB,edgesy] = histcounts(B(:,2),Yedges);
FXA = HXA/sum(HXA)+sqrt(eps);
FXB = HXB/sum(HXB)+sqrt(eps);
FYA = HYA/sum(HYA)+sqrt(eps);
FYB = HXA/sum(HYB)+sqrt(eps);
QX = (FXA(:)-FXB(:)).^2./FXA(:);
Chi2_X = sum((FXA(:)-FXB(:)).^2./FXA(:));
Chi2_Y = sum((FYA(:)-FYB(:)).^2./FYA(:));
df = size(FXA(:),1)-1;
P1 = chi2cdf(Chi2_X, df);
P2 = chi2cdf(Chi2_Y, df);

I believe this is correct. I’ve not written code to calculate chi-square statistics in a while. Adding ‘sqrt(eps)’ prevents Inf and NaN values in the chi-square calculations, since some of the bins have zero values.

Unfortunately, the p-values are vanishingly small, meaning that the distributions are different (the probability of their being the same is essentially zero).

I would be hesitant to use pwelch on random spatial data. You might want to experiment with the fft2 function instead, and the image processing functions.

Yours appears to be a relatively new problem. I am not certain how to approach it, and the literature search I did turned up no relevant results.

Kafayat Olayinka el 29 de Mayo de 2020

Can you show us how to plot this and what it'll look like? Thanks

Iniciar sesión para comentar.

quantifying the similarity between data sets

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuestas (2)

2 comentarios
Mostrar Ninguno Ocultar Ninguno

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Categorías

Productos

Etiquetas

Community Treasure Hunt

quantifying the similarity between data sets

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Respuestas (2)

2 comentarios Mostrar Ninguno Ocultar Ninguno

3 comentarios Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Categorías

Productos

Etiquetas

Ver también

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

2 comentarios
Mostrar Ninguno Ocultar Ninguno

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo