Hello,
I have data for the size of rocks for two independent locations, A and B.
Data lists rock size from smallest to largest (shown by yellow column)
Red column shows the percentage of rocks under that size (out of 100%)
Green columns shows cumulative percentage (out of 100%)
**Normality test indicates rock data (yellow columns) is not normally distributed.**
**When I apply log transform to the data, it becomes normally distributed, and shows a straight (linear) line when plotted on a graph**
*I want to compare rock sizes between the two locations*
**What is the best way to compare rock sizes (yellow columns) given that these values are thresholds, not actual rock sizes?**

 Respuesta aceptada

Star Strider
Star Strider el 20 de En. de 2020

1 voto

If they both have the same distribution (regardless of what that distribution is),and you are comparing two samples, the ranksum test is likely the most appropriate.

4 comentarios

Sarah Yun
Sarah Yun el 20 de En. de 2020
Hi Star,
Thank you.
My confusion comes from fact that data are not ACTUAL rock sizes, but some kind of threshold.
Is it possible to find the ACTUAL rock size from this data?
Star Strider
Star Strider el 20 de En. de 2020
My pleasure!
The actual rock size data seem to have been discarded. What is left is essentially a histogram, with the bins being the yellow columns and the relative counts being the red columns.
One option is to create a single vector of rock threshold sizes (yellow column) for both sites (using the linspace function), then using interp1 with the same threshold rock size vector and the red columns from each site separately, estimate the frequencies of the rocks with those common sizes at each site. Also, do not extrapolate, so that the rock sizes that not shared by both sites will be NaN for site A. You can then use isnan to eliminate the NaN values from the site A interpolation.
Then use the interpolation results with ranksum. This is likely as close as you can get for derived data.
Sarah Yun
Sarah Yun el 20 de En. de 2020
Thank you Star.
Is it possible to just do ranksum on the yellow column (the bins) for both data? Or would this violate a rule?
Star Strider
Star Strider el 20 de En. de 2020
My pleasure.
It would likely be more appropriate to use it on the red columns, since (as I understand it) those are the relative frequencies of the sizes.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Geology en Centro de ayuda y File Exchange.

Etiquetas

Preguntada:

el 20 de En. de 2020

Comentada:

el 20 de En. de 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by