Ezyfit: Single vs. Two-Step Fitting Problem

6 visualizaciones (últimos 30 días)
Sim
Sim el 13 de Mzo. de 2025
Editada: Sim el 17 de Mzo. de 2025
Ezfit (actually I am using the updated version from Walter Roberson) fails to accurately determine all 4 parameters in a single curve fit (see Method A). A workaround, setting 1 parameter initially and then fitting the remaining 3 in a second step (Method B), looks instead successful.
Is there a method or setting within Ezfit to obtain accurate 4-parameter results in a single fitting run (Method A)?
% (1) Input Data
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
% (2) Method A: Perform a simple curve fitting with Ezfit
ft1 = ezfit(x,log(y),'log(a*x^(-b) + c*exp(-d*x))','MaxFunEvals', 1e6, 'MaxIter', 1e6); %, 'usega', usega);
array2table([ft1.m(1) ft1.m(2) ft1.m(3) ft1.m(4) (ft1.r)^2],'VariableNames',{'a','b','c','d','R2'})
% a b c d R2
% _____ ______ ______ _____ _______
%
% 56106 2.2236 -85026 49582 0.85991
% (3) Method B: Perform curve fitting by using Ezfit 2 times
% (3.1) With the first Ezfit, find the exponent "b" (among a range of values between 1.5 and 2.5)
% which maximise the coefficient of determination R^2 (I set it as R^2 = 0.99).
% The resulting exponent will be "b_range(idx_max_b) = 1.86".
i = 1;
b_range = 1.5 : 0.01 : 2.5;
for b = b_range
ft2 = ezfit(x,log(y),sprintf('log(a*x^(-%.2f) + c*exp(-d*x))',b),'options',optimset('MaxFunEvals', 1e6, 'MaxIter', 1e6));
ftr(i) = (ft2.r)^2;
i = i + 1;
end
idx_max_b = find(ftr == max(ftr(ftr < 0.99)));
% (3.2) Following the initial determination of the optimal exponent "b" (i.e. "b_range(idx_max_b)"),
% the second Ezfit step finds the remaining parameters
ft3 = ezfit(x,log(y),sprintf('log(a*x^(-%.2f) + c*exp(-d*x))',b_range(idx_max_b)),'options',optimset('MaxFunEvals', 1e6, 'MaxIter', 1e6));
array2table([ft3.m(1) b_range(idx_max_b) ft3.m(2) ft3.m(3) (ft3.r)^2],'VariableNames',{'a','b','c','d','R2'})
% a b c d R2
% ______ ____ _____ _________ _______
%
% 2084.4 1.86 4.806 0.0064196 0.98323
% (4) Plot
hold on
plot(x, y, 'o', 'MarkerFaceColor',[0.7 0.7 0.7],'DisplayName', 'Input Data');
xx = 10 : 10000;
plot(xx, ft1.m(1) .* xx.^(-ft1.m(2)) + ft1.m(3) .* exp(-ft1.m(4) .* xx), 'color','blue', 'Linewidth',2, 'DisplayName', 'Method A');
plot(xx, ft3.m(1) .* xx.^(-b_range(idx_max_b)) + ft3.m(2) .* exp(-ft3.m(3) .* xx), 'color', 'red', 'Linewidth',2, 'DisplayName', 'Method B')
xlabel('x');
ylabel('y');
legend('show');
set(gca, 'XScale', 'log', 'YScale', 'log')
  7 comentarios
Torsten
Torsten el 14 de Mzo. de 2025
Editada: Torsten el 14 de Mzo. de 2025
As said: you must decide whether you want to fit log(y) against log(x) using the function or y against x. Both will give different parameters for the fitting process. Usually, fitting y against x is intended.
Say you have the function y = a*x^b as fitting function. Then taking the log on both sides gives log(y) = log(a) + b*log(x), thus a function where log(y) depends linearily on log(x). The latter is numerically simpler, but the results for a and b will differ from the nonlinear fit of y against x.
Sim
Sim el 14 de Mzo. de 2025
Thanks @Torsten for your clarifications, thanks :-)

Iniciar sesión para comentar.

Respuesta aceptada

Walter Roberson
Walter Roberson el 13 de Mzo. de 2025
Is there a method or setting within Ezfit to obtain accurate 4-parameter results in a single fitting run (Method A)?
No.
  8 comentarios
Walter Roberson
Walter Roberson el 14 de Mzo. de 2025
@Alex Sha used a third-party optimizer named 1stOpt. It is a somewhat expensive tool (about $US2000 per user), but from what I have seen, it produces excellent results much faster than competitors. I would consider buying it myself, but I cannot currently justify the price.
Sim
Sim el 17 de Mzo. de 2025
Thanks @Walter Roberson for sharing! :-) I have checked it on the web, and it looks very interesting and promising (even though the academics might find the cost prohibitive if it is the same for them).

Iniciar sesión para comentar.

Más respuestas (2)

Sam Chak
Sam Chak el 14 de Mzo. de 2025
Hi @Sim
You can verify this. The model proposed by @Torsten fits quite well within the first 10% of the data range. However, after that point, the model decays much more slowly than the data, although, in theory, it will eventually converge to zero. Inspired by this model, it suggests that you should consider implementing a nonlinear power rate, as illustrated below. You can also find a better a nonlinear function, preferably a bounded one.
format long g
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
funlist = {@(coef, xdata) xdata.^(-coef(1)*(xdata).^(0.4)), @(coef, xdata) exp(-coef(2)*xdata)};
NLPstart = [1, 1];
options = optimset('disp', 'none');
[INLP, ILP] = fminspleas(funlist, NLPstart, x, y, [], [], [], options)
Warning: Rank deficient, rank = 1, tol = 1.957112e+05.
Warning: Rank deficient, rank = 1, tol = 6.998085e-03.
Warning: Rank deficient, rank = 1, tol = 6.998085e-03.
INLP = 1×2
0.0594819972748915 3.50973243531023
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ILP = 2×1
1.0e+00 * 22.0279801300898 7302263919.12418
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
xf = linspace(x(1), x(end), numel(x)*1e3+1);
yf = ILP(1)*xf.^(-INLP(1)*(xf).^(0.4)) + ILP(2)*exp(-INLP(2)*xf);
%% Plot results
figure
plot(x, y, 'o'), hold on
plot(xf, yf), grid on, hold off
legend('data', 'model')
xlabel('x'), ylabel('y')
set(gca, 'XScale', 'log','YScale', 'log')
figure
subplot(211)
plot(x, y, 'o'), hold on
plot(xf, yf), xlim([0 700]), grid on, hold off
legend('data', 'model')
xlabel('x'), ylabel('y')
title({'Displaying model fitness in interval $x \in [0, 700]$'}, 'interpreter', 'latex')
subplot(212)
plot(x, y, 'o'), hold on
plot(xf, yf), xlim([700 7000]), grid on, hold off
legend('data', 'model')
xlabel('x'), ylabel('y')
title({'Displaying model fitness in interval $x \in [700, 7000]$'}, 'interpreter', 'latex')
  3 comentarios
Sim
Sim el 14 de Mzo. de 2025
Editada: Sim el 14 de Mzo. de 2025
Hi @Sam Chak, thanks for your nice answer :-)
I do not know how to get R-squared with "fminspleas", but it looks a nice fit!
It looks like you used a "power tower" or "tetration", plus the exponential decay... this makes things a bit more complicated :-) I was thinking that, by using the "power tower", probably, the exponential decay is not necessary anymore (?):
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
funlist = {@(coef, xdata) xdata.^(-coef(1)*(xdata).^(coef(2)))};
NLPstart = [1, 1];
options = optimset('disp', 'none');
[INLP, ILP] = fminspleas(funlist, NLPstart, x, y, [], [], [], options)
xf = linspace(x(1), x(end), numel(x)*1e3+1);
yf = ILP(1)*xf.^(-INLP(1)*(xf).^(INLP(2)));
hold on
plot(x, y, 'o')
plot(xf, yf)
set(gca, 'XScale', 'log','YScale', 'log')
However, the idea to use a "simple" power law and an exponential decay, would be supported by some theoretical argument... by using, instead, a power tower, the theoretical argument would be more complicated.... :-)
Sim
Sim el 14 de Mzo. de 2025
Editada: Sim el 14 de Mzo. de 2025
Thanks @Sam Chak! I have just read your comment, just a second after I posted mine...! Yes, it could also be an exponential function-based model. I will look into it further. However, the sum of a simple power law and a simple exponential decay can be justified through a theoretical argument (in particular the power law term). If we use nonlinear power rates—whether power law-based or exponential-based one—it could make the results much harder to interpret.... :-)
...I can keep an eye on this webpage to see if @Alex Sha might be interested in adding more on this topic. By the way, there are some great fitting tools I wasn’t aware of, like "fminspleas", which looks quite impressive! :-)

Iniciar sesión para comentar.


Alex Sha
Alex Sha el 15 de Mzo. de 2025
Movida: Sam Chak el 15 de Mzo. de 2025
Hi, all, if add one more parameter to the original fitting function, i.e. y=ln(a*x^(-b)-c*exp(-d*x))*f, the result will be improved like:
Sum Squared Error (SSE): 62.0296923820521
Root of Mean Square Error (RMSE): 0.869746896054109
Correlation Coef. (R): 0.970745675455549
R-Square: 0.94234716641565
Parameter Best Estimate
--------- -------------
a 0.97506498591964
b -0.0034886080404169
c 1.00586174849318
d 0.0069594641193838
f -5.56698125787981
  4 comentarios
Alex Sha
Alex Sha el 17 de Mzo. de 2025
Not sure whether or not my understanding is correct, if want fitting result good in log() scale, is it reasonab to change y data to y1=log(y) firstly, then the fitting function become "y1=log(a*x^(-b)-c*exp(-d*x))", the result will be:
Sum Squared Error (SSE): 3.16256795387294
Root of Mean Square Error (RMSE): 0.196387122481336
Correlation Coef. (R): 0.991266174106857
R-Square: 0.982608627928446
Parameter Best Estimate
--------- -------------
a 10.6590120083535
b 0.69245173591868
c -1.65676470503943
d 0.00313457062827038
if taking function as "y1=f*log(a*x^(-b)-c*exp(-d*x))"
Sum Squared Error (SSE): 2.95997975085014
Root of Mean Square Error (RMSE): 0.189992931538933
Correlation Coef. (R): 0.99182732299325
R-Square: 0.983721438635958
Parameter Best Estimate
--------- -------------
f 4.78460068319797
a 0.918658120426212
b 0.0761834706279075
c -0.540969703020373
d 0.00132426521534454
Sim
Sim el 17 de Mzo. de 2025
Editada: Sim el 17 de Mzo. de 2025
Thanks a lot @Alex Sha! Now it is clear, thanks! :-)
Therefore, we can get an R-square equal to 0.982608627928446, by using
y1 = log(a*x^(-b) - c*exp(-d*x))
and we can get an R-square equal to 0.983721438635958, by using
y1 = f * log(a*x^(-b) - c*exp(-d*x))
By multiplying by a factor "f", the fit improves further, but even without it, the fit is already quite good. :-)
Thanks a lot!

Iniciar sesión para comentar.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by