Statistical Test for decaying signals
Mostrar comentarios más antiguos
I have two decaying relative intensity curves and would like a statistical test to show that they are different, each time point on each curve is produced by averaging from 100 taken data points - does anyone have any suggestions? The data is:
Data 1: 1 0.914144 0.876253 0.836468 0.806563 0.781585 0.744672 0.727541 0.695955 0.677459 0.630814 0.637396 0.609646 0.569227 0.565882 0.529177 0.520497 0.514375 0.504086 0.474612 0.447513 0.425238 0.432216 0.441622 0.407928 0.381347 0.387921 0.387443 0.380426 0.363821 0.353484
Data 2: 0.984578 0.9664 0.985515 0.98057 1 0.980536 0.930023 0.957503 0.903321 0.886397 0.897744 0.821625 0.85142 0.833694 0.826525 0.81353 0.768527 0.793422 0.81677 0.76768 0.773302 0.777807 0.736474 0.693616 0.694688 0.74992 0.712753 0.700593 0.708191 0.677843 0.720385
Time: 0 60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800
Respuestas (2)
‘... each time point on each curve is produced by averaging from 100 taken data points ...’
The statistical test depends on what the data represent, and the characteristics of those data. However, just using the mean values is not going to be of any real value, since you also need to have measures of the dispersion of the data, specifically the variance, and if applicable, the standard deviation (since not all distributions — such as the lognormal distribution — have standard deviations).
If you do not know the underlying distributions of the data, my suggestion would be to use a nonparametric test. There are several that could work, however the friedman test might be the most appropriate here.
Data_1 = [1 0.914144 0.876253 0.836468 0.806563 0.781585 0.744672 0.727541 0.695955 0.677459 0.630814 0.637396 0.609646 0.569227 0.565882 0.529177 0.520497 0.514375 0.504086 0.474612 0.447513 0.425238 0.432216 0.441622 0.407928 0.381347 0.387921 0.387443 0.380426 0.363821 0.353484];
Data_2 = [0.984578 0.9664 0.985515 0.98057 1 0.980536 0.930023 0.957503 0.903321 0.886397 0.897744 0.821625 0.85142 0.833694 0.826525 0.81353 0.768527 0.793422 0.81677 0.76768 0.773302 0.777807 0.736474 0.693616 0.694688 0.74992 0.712753 0.700593 0.708191 0.677843 0.720385];
Time = [ 0 60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800];
figure
plot(Time, Data_1, '.-', 'DisplayName','Data_1')
hold on
plot(Time, Data_2, '.-', 'DisplayName','Data_2')
hold off
grid
legend('Location','best')
.
4 comentarios
John D'Errico
el 24 de Sept. de 2024
"(since not all distributions — such as the lognormal distribution — have standard deviations"
Incorrect. A lognormal distribution DOES indeed have a standard deviation.
In there, you will see the variance of a lognormal. The standard deviation is the sqrt of the variance. If the variance is well defined, then so is the standard deviation. Both the variance and the standard deviation of a lognormal will be big things, and not terribly useful in terms of how we usually think about those parameters, just because we tend to think of variances in terms of a normal distribution. For example, we tend to always think of a mean, +/- some number of standard deviations. Those things are naturally burned into our brains when we do any kind of statistics.
However, for a lognormal distribution, defined in terms of the mean (mu) and variance (sigma^2) of the underlying normal, the mean of the lognormal wil be:
exp(mu + sigma^2)
and the standard deviation is:
sqrt(exp(sigma^2) - 1)*exp(mu + sigma^2/2)
Now we can compute the point where k standard deviations takes you below 0. I did this for a standard lognormal
exp(1/2)/(exp(1) - 1)^(1/2) = 1.2577665549971212461540582615847
So anything below the mean minus 1.26 standard deviations for a standard lognormal yields a negative number.
Had you suggested a Cauchy distribution (or some others, a Cauchy is the one that immediately comes to mind for me) does not have a variance OR a standard deviation, then you would have been absolutely correct.
Star Strider
el 24 de Sept. de 2024
I interpreted that the standard deviation not being listed among the properties in the Wikipedia article meant that it was not defined for the lognormal distribution. I tend not to use it, preferring to calculate the percentiles from the logninv function, when i need them.
Henry Carey-Morgan
el 8 de Oct. de 2024
To do what I suggested, you need the original data at each point.
That would go something like this —
Data_1 = [1 0.914144 0.876253 0.836468 0.806563 0.781585 0.744672 0.727541 0.695955 0.677459 0.630814 0.637396 0.609646 0.569227 0.565882 0.529177 0.520497 0.514375 0.504086 0.474612 0.447513 0.425238 0.432216 0.441622 0.407928 0.381347 0.387921 0.387443 0.380426 0.363821 0.353484];
Data_2 = [0.984578 0.9664 0.985515 0.98057 1 0.980536 0.930023 0.957503 0.903321 0.886397 0.897744 0.821625 0.85142 0.833694 0.826525 0.81353 0.768527 0.793422 0.81677 0.76768 0.773302 0.777807 0.736474 0.693616 0.694688 0.74992 0.712753 0.700593 0.708191 0.677843 0.720385];
Time = [ 0 60 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800];
nPoints = numel(Time)
Data_Orig_1 = Data_1 + (rand(10, numel(Data_1))-0.5);
Data_Orig_2 = Data_2 + (rand(10, numel(Data_1))-0.5);
orng = [0.9 0.5 0.2];
friedman_data = [Data_Orig_1(:) Data_Orig_2(:)]
[p,T,S] = friedman(friedman_data, size(Data_Orig_1,1))
figure
hp1 = plot(Time, Data_1, '.-b', 'DisplayName','Data_1', 'LineWidth',1.5);
hold on
plot(Time, Data_Orig_1, '.b')
hp2 = plot(Time, Data_2, '.-', 'DisplayName','Data_2', 'Color',orng, 'LineWidth',1.5);
plot(Time, Data_Orig_2, '.', 'Color',orng)
hold off
grid
xlabel('Time')
ylabel('Value')
legend([hp1 hp2],'Location','best')
Here, the matrix for the friedman test consists of vertically-concatenated columns of the data around the original points, of which there are uniformly 10 each (the second argument to friedman), creating a (310x2) matrix. The friedman function then compares these two, and determines that they are statistically signifiicant (in this instance). I also considered using multcompare however with only two groups, it is likely not necessary here.
I have never done anything even remotely like this, nor seen it done. (I have only compared two models using the same data with the likelihood ratio test.) I believe the friedman test is appropriate for this problem. In any event, I cannot envision any other way to approach it.
.
Jeff Miller
el 25 de Sept. de 2024
One simple approach is to fit a straight line to each dataset and show that the slopes are statistically different. For example,
% I'm dividing Time by 1000 to get more readable slope values--i.e.,
% decrease per 1000 time units.
mdl1 = fitlm(Time/1000,Data_1);
ci1 = coefCI(mdl1);
mdl2 = fitlm(Time/1000,Data_2);
ci2 = coefCI(mdl2);
fprintf('Slope for data 1 = %f with 95 pct confidence interval %f to %f\n',mdl1.Coefficients.Estimate(2),ci1(2,1),ci1(2,2));
fprintf('Slope for data 2 = %f with 95 pct confidence interval %f to %f\n',mdl2.Coefficients.Estimate(2),ci2(2,1),ci2(2,2));
% Slope for data 1 = -0.326349 with 95 pct confidence interval -0.355400 to -0.297297
% Slope for data 2 = -0.185232 with 95 pct confidence interval -0.204295 to -0.166170
Since the confidence intervals don't overlap (and it's not even close), you are statistically justified in concluding that the decrease is steeper for data 1 than 2.
If you need an actual p value for a test of the difference in slopes, you'll need to do a bit more work.
Categorías
Más información sobre Probability Distributions and Hypothesis Tests en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


