I want to compare fit() using option 'sin1' with the same equation defined using 'fittype'.

24 visualizaciones (últimos 30 días)
I have used fit() with the option 'sin1' to fit a noisy set of data that very approximately look like a sine wave. To understand how 'fittype' works, I would like to fit the curve 'a*sin(b*x+c)' using 'fittype'. This should give the same answer. It does if the StartPoint is the answer; it does not if I omit StartPoint or put in some random values.
How do I make 'fittype' robust to a wide range of (noisy) data and different starting points. The standard option using fit() and 'sin1' does this.
I attach some simple sample code.
% I am trying to use the fit function to fit sine and/or cosine curves to a
% cyclic dataset that is very noisy. I am most interested in the frequency
% but the noise often prevents the FFT function from giving the frequency
% that I am interested in.
% This code uses the fit function with the predefined curve 'sin1' to
% calculate the best fit. This works very well. I then try and do the
% same thing using my own curve, 'a*sin(b*x+c)', i.e. the same curve, but I
% fail. I only succeed when the StartPoint is very close to the correct
% solution.
% I feel that I must be doing something wrong, but I cannot work out what
% it is.
xdata = (-19.5:19.5)'; % I think that any evenly spaced data will do
rdata = [-3.8699, -11.9740, -1.0781, 9.4637, ... % Real data from my application
18.2971, 3.7348, -11.5774, 0.3397, 8.4653, 7.7785, -1.4079, -13.4483, -7.6968, ...
5.1592, 6.4323, -8.2318, -14.8538, 0.8788, 10.9452, 5.6580, -9.0037, -10.0607, ...
-0.5338, 8.2230, 3.8553, 1.3426, -1.0028, -1.8683, 6.7254, 10.0907, 3.2068, ...
-7.2804, -4.3917, 6.5188, 5.7636, -3.0115, -13.8272, -12.7043, 5.0029, 9.9405]';
p = polyfit(xdata, rdata, 3); % Subtracting a cubic curve to level the ydata
py = polyval(p, xdata);
ydata = rdata - py;
[f, gof, output] = fit(xdata, ydata, 'sin1'); % Using the standard MATLAB curve, 'sin1'
fprintf(' ''sin1'': f.a: %-6.2f f.b: %-6.2f f.c: %-6.2f gof.sse: %-6.1f\n', ...
f.a1, f.b1, f.c1, (gof.sse/numel(xdata)));
xdata10 = xdata(1):0.1:xdata(end); % Defining extra xdata to give a smoother line
fy = feval(f, xdata10);
mysin = fittype('a*sin(b*x+c)', ... % My attempt to do the same thing
'dependent', {'y'}, 'independent', {'x'}, 'coefficients', {'a','b','c'});
[f2, gof2, output2] = fit(xdata, ydata, mysin, 'StartPoint', [1.0, 1.0, 1.0]);
fprintf(' mysin: f.a: %-6.2f f.b: %-6.2f f.c: %-6.2f gof.sse: %-6.1f\n', ...
f2.a, f2.b, f2.c, (gof2.sse/numel(xdata)));
fy2 = feval(f2, xdata10);
plot(xdata, ydata, 'xb:')
hold on
plot(xdata10, fy, ' m--')
plot(xdata10, fy2, ' r-')
legend('Real data', '''sin1'' fit', 'My fit');
hold off
  2 comentarios
dpb
dpb el 20 de Feb. de 2023
Editada: dpb el 20 de Feb. de 2023
xdata = (-19.5:19.5)'; % I think that any evenly spaced data will do
rdata = [-3.8699, -11.9740, -1.0781, 9.4637, ... % Real data from my application
18.2971, 3.7348, -11.5774, 0.3397, 8.4653, 7.7785, -1.4079, -13.4483, -7.6968, ...
5.1592, 6.4323, -8.2318, -14.8538, 0.8788, 10.9452, 5.6580, -9.0037, -10.0607, ...
-0.5338, 8.2230, 3.8553, 1.3426, -1.0028, -1.8683, 6.7254, 10.0907, 3.2068, ...
-7.2804, -4.3917, 6.5188, 5.7636, -3.0115, -13.8272, -12.7043, 5.0029, 9.9405]';
p = polyfit(xdata, rdata, 3); % Subtracting a cubic curve to level the ydata
py = polyval(p, xdata);
ydata = rdata - py;
plot(xdata,rdata)
hold on
plot(xdata,py)
p
p = 1×4
1.0e-05 * 0.0003 -0.0026 -0.0866 -0.1570
The data are symmetric; the polynomial does nothing useful, at least for the sample dataset.
However, just as a minor simplification, MATLAB supplies the equivalent with detrend so one can write the above as
ydata=detrend(rdata,3);
if there are datasets for which it might be helpful.
As for the Q? posed, as the doc notes, sinN models are very sensitive to starting points and the MATLAB builtin solver uses an (AFAICT undocumented) optimization technique internally to get an initial starting point estimate and bounds the coefficients >0 whereas the custom model uses simply a randomized starting point on the [0,1] interval and no bounds on the coefficients.
To reproduce that result, you would also have to use a fit options object and starting points at least reasonably close to those the MATLAB heuristic model uses (which, of course, is undocumented/proprietary).
So, the Q? in return is, if the builtin model produces the desired results, why not use it instead? Other than simply knowing what the builtin algorithm is, which may be of interest and perhaps even needed (to publish?).
I couldn't in a quick look, find anything more about the algorithm TMW uses...whether there's any other information publicly available or not, I don't know.
Nick Granville
Nick Granville el 21 de Feb. de 2023
My aim is to understand what 'fit' and 'fittype' do, rather than tell the world what MATLAB do. Ultimately, I may want to fit 'a*(sin(b*x+c))^0.5'. I wanted to ensure that I got the best result, rather than a nonsense result.
As an aside, I accept that the equation should properly be written: 'a*sign(sin(bx+c))*(abs(sin(b*x+c)))^0.5'. Choosing the best starting points may become even more important.

Iniciar sesión para comentar.

Respuesta aceptada

Matt J
Matt J el 20 de Feb. de 2023
Editada: Matt J el 20 de Feb. de 2023
How do I make 'fittype' robust to a wide range of (noisy) data and different starting points. The standard option using fit() and 'sin1' does this.
When you specify 'sin1', the solver knows what kind of fit it is looking for and can choose an appropriate strategy for forming the initial StatPoint. Conversely, when you specify the fittype as a custom model, the burden falls upon you to do that.
Choosing a StartPoint randomly is not a shrewd strategy. You know for example, that
a=(max(ydata)-min(ydata))/2
will be a better choice for a than an arbitrary selection. You can also then evaluate curve at particular x,y points to solve approximately for the other parameters. For example, since Y(0)=a*sin(c), you could do,
Y0=interp1(xdata,ydata,0);
c=asin(Y0/a);
  6 comentarios
Nick Granville
Nick Granville el 21 de Feb. de 2023
Thank you both for your comments. You confirmed that fitting 'sin1' or 'a*sin(b*x+c)' should behave similarly. The skill is in choosing the starting points. Therefore:
temparray = ydata(1 + find(diff(sign(round(diff(ydata)))) ~= 0));
sp1 = mean(abs(temparray));
sp2 = numel(temparray) * pi() / (numel(ydata)-1);
sp3 = pi()/2;
temparray contains the values of all the turning points (without using the Signal Processing Toolbox). sp1 (i.e. 'a') is the mean absolute value of the turning points and so a good indicator of amplitude. sp2 (or 'b') is related to the number of turning points and so a good indicator of frequency. I am still working on getting a robust value of sp3 (or 'c') that is not overly dependent on the first value of my data.
Matt J
Matt J el 21 de Feb. de 2023
I am still working on getting a robust value of sp3 (or 'c') that is not overly dependent on the first value of my data.
The method in my comment above does so.

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Smoothing en Help Center y File Exchange.

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by