Identifying the linear part of a curve

113 visualizaciones (últimos 30 días)
schilakwad
schilakwad el 15 de Feb. de 2021
Comentada: Star Strider el 15 de Feb. de 2021
Hi,
I am trying to identify the x and y values where the linear regime ends of a curve that is fairly noisy.
I've tried a few different methods to identify my coordinates; first, I've tried using the ischange function, but have some trouble identifying if the MaxNumChanges value is appropriate. The values generated from specifying 3 maximum changes seems to be the most accurate for my system, but I'd like to know if there is a better way to this.
Second, I've also tried using a threshold value as an input instead of MaxNumChanges but am unsure of what value to select. Using any Threshold value from 30-76 provides the same result as having MaxNumChanges set to 3.
Lastly, I've also simply tried to fit a linear model to the linear regime and excluded data points until the R2 value is unaffected.
I'd appreciate is any one can tell me what the most robust way to do this is.
Heres a picture shwowing the coordinates with varying MaxNumChanges:
Here's the code I used:
T1 = readtable('BLL_96wp_Matlab test.xlsx');
T1.Properties.VariableNames = {'Molecules','A245'};
% ThrVal = 50;
% [TF,Sl,Ic] = ischange(T1.Molecules, 'linear','Threshold',ThrVal);
[TF,SI,Ic] = ischange(T1.Molecules, 'linear','MaxNumChanges',3);
Idx = find(TF);
figure
plot(T1.Molecules, T1.A245)
hold on
plot(T1.Molecules(Idx(1)), T1.A245(Idx(1)), '+r')
hold off
grid
text(T1.Molecules(Idx(1)), T1.A245(Idx(1)), sprintf('\\leftarrowLinear Region End:\n Molecules = %10.4f\n A245 = %10.4f', T1.Molecules(Idx(1)), T1.A245(Idx(1))), 'HorizontalAlignment','left', 'VerticalAlignment','top')
Thank you!

Respuestas (1)

Star Strider
Star Strider el 15 de Feb. de 2021
Editada: Star Strider el 15 de Feb. de 2021
The problem is that the data are noisy, so they need to be made less so.
Also, the smoothing and ischange functions need to use ‘A245’, not ‘Molecules’, at least if I correctly assume what you want to do.
Try this:
T1 = readtable('BLL_96wp_Matlab test.xlsx');
T1.Properties.VariableNames = {'Molecules','A245'};
N = height(T1);
A245 = filtfilt(ones(1,fix(N/20)), N/20, T1.A245);
% ThrVal = 50;
% [TF,Sl,Ic] = ischange(T1.Molecules, 'linear','Threshold',ThrVal);
[TF,SI,Ic] = ischange(A245, 'linear');%,'MaxNumChanges',3);
Idx = find(TF);
figure
plot(T1.Molecules, T1.A245)
hold on
plot(T1.Molecules(Idx(end)), T1.A245(Idx(end)), '+r')
hold off
grid
xlabel('Molecules')
ylabel('A_{245}')
text(T1.Molecules(Idx(end)), T1.A245(Idx(end))*1.025, sprintf('\\leftarrowLinear Region End:\n Molecules = %10.4f\n A245 = %10.4f', T1.Molecules(Idx(end)), T1.A245(Idx(end))), 'HorizontalAlignment','left', 'VerticalAlignment','top')
figure
plot(T1.Molecules, A245)
grid
xlabel('Molecules')
ylabel('A_{245}')
title('Smoothed A_{245}')
The second plot is not necessary for the code. It simply shows the effect of using a simple moving-average filter to smooth the data. (I have no idea what version/release you are using, or what Toolboxes you have, so using filter instead of filtfilt will likely produce an acceptable result, as well as movmean or smoothdata. I wrote my enhancements to use the simplest options available.)
EDIT — (15 Feb 2021 at 14:28)
Added plot figure:
.
  3 comentarios
Image Analyst
Image Analyst el 15 de Feb. de 2021
One way you could do it is to do a piecewise linear fit. Basically I divide the data into two sections (left and right) and fit lines. The "splitting point" is where the difference in slopes is the greatest.
% Find optimal pair of lines to fit noisy data, one line on left side and one line on right side. Separation/crossing x value is identified.
clc; % Clear command window.
clear; % Delete all variables.
close all; % Close all figure windows except those created by imtool.
workspace; % Make sure the workspace panel is showing.
fontSize = 18;
markerSize = 20;
% Read in table.
T1 = readtable('BLL_96wp_Matlab test.xlsx');
T1.Properties.VariableNames = {'Molecules','A245'};
% Get x and y for convenient names.
x = T1.Molecules;
yNoisy = T1.A245;
% Define number of points.
numPoints = length(x);
% Plot lines.
subplot(2, 2, 1);
plot(x, yNoisy, 'b.', 'MarkerSize', markerSize);
grid on;
xlabel('x', 'FontSize', fontSize);
ylabel('y', 'FontSize', fontSize);
title('Noisy Y vs. X data', 'FontSize', fontSize);
% Enlarge figure to full screen.
set(gcf, 'Units', 'Normalized', 'OuterPosition', [0, 0.04, 1, 0.96]);
% Assume the crossing point will be somewhere in the middle half of the points.
% Fit a line through the right and left parts and get the slopes.
% Keep the point where the slope difference is greatest.
index1 = find(x > 2, 1, 'first') % We need enough to define a fairly good line.
index2 = numPoints - 5 % % We need enough to define a fairly good line so use at least 25 points on the right hand side.
% In other words, assume that we need at least several of the points to make a good estimate of the line.
% Obviously if we took only 2 or 3 points, then the slope could vary quite dramatically,
% so let's use at least enough of the points to make sure we don't get crazy slopes.
% Initialize structure array
for k = 1 : numPoints
lineData(k).slopeDifferences = 0;
lineData(k).line1 = [0,0];
lineData(k).line2 = [0,0];
end
for k = index1 : index2
% Get data in left side.
x1 = x(1:k);
y1 = yNoisy(1:k);
% Fit a line through the left side.
coefficients1 = polyfit(x1, y1, 1); % The slope is coefficients1(1).
% Get data in right side.
x2 = x(k+1:end);
y2 = yNoisy(k+1:end);
% Fit a line through the left side.
coefficients2 = polyfit(x2, y2, 1); % The slope is coefficients2(1).
% Compute difference in slopes, and store in structure array along with line equation coefficients.
lineData(k).slopeDifferences = abs(coefficients1(1) - coefficients2(1));
lineData(k).line1 = coefficients1;
lineData(k).line2 = coefficients2;
end
% Find index for which slope difference is greatest.
slopeDifferences = [lineData.slopeDifferences]; % Extract from structure array into double vector of slope differences only
% slope1s = struct2table(lineData.line1); % Extract from structure array into double vector of slopes only
% slope2s = [lineData.line2(1)]; % Extract from structure array into double vector of slopes only
[maxSlopeDiff, indexOfMaxSlopeDiff] = max(slopeDifferences)
% Plot slope differences.
subplot(2, 2, 2);
plot(slopeDifferences, 'b.', 'MarkerSize', markerSize);
grid on;
caption = sprintf('Slope Differences Maximum at Index = %d, x value = %.2f', indexOfMaxSlopeDiff, x(indexOfMaxSlopeDiff));
title(caption, 'FontSize', fontSize);
% Mark it with a red line.
line([indexOfMaxSlopeDiff, indexOfMaxSlopeDiff], [0, maxSlopeDiff], 'Color', 'r', 'LineWidth', 2);
% Show everything together all on one plot.
% Plot lines.
subplot(2, 2, 3:4);
plot(x, yNoisy, 'b.', 'MarkerSize', markerSize);
grid on;
xlabel('x', 'FontSize', fontSize);
ylabel('y', 'FontSize', fontSize);
hold on;
% Use the equation of line1 to get fitted/regressed y1 values.
slope1 = lineData(indexOfMaxSlopeDiff).line1(1);
intercept1 = lineData(indexOfMaxSlopeDiff).line1(2);
y1Fitted = slope1 * x + intercept1;
% Plot line 1 over/through data.
plot(x, y1Fitted, 'r-', 'LineWidth', 2);
% Use the equation of line2 to get fitted/regressed y2 values.
slope2 = lineData(indexOfMaxSlopeDiff).line2(1);
intercept2 = lineData(indexOfMaxSlopeDiff).line2(2);
y2Fitted = slope2 * x + intercept2;
% Plot line 2 over/through data.
plot(x, y2Fitted, 'r-', 'LineWidth', 2);
ylim([min(yNoisy), 1.2 * max(yNoisy)]);
% Mark crossing with a magenta line.
xc = (intercept2 - intercept1) / (slope1 - slope2);
line([xc, xc], [0, y2Fitted(indexOfMaxSlopeDiff)], 'Color', 'm', 'LineWidth', 2);
title('Data with left and right lines overlaid', 'FontSize', fontSize);
message1 = sprintf('Left Equation: y = %.3f * x + %.3f', slope1, intercept1);
message2 = sprintf('Right Equation: y = %.3f * x + %.3f', slope2, intercept2);
message = sprintf('%s\n%s', message1, message2);
fprintf('%s\n', message);
text(10, 2, message, 'Color', 'r', 'FontSize', 15, 'FontWeight', 'bold');
uiwait(helpdlg(message));
Another way you might define the best splitting point is where the RMS or MAD value of the residuals is minimized. So it's a bit of a judgment call as to where the "best" splitting point is -- it depends on the criteria you use to define "best".
Star Strider
Star Strider el 15 de Feb. de 2021
My pleasure!
Can you tell me why you didn't specify MaxNumChanges in your code?
That argument was not necessary. With the filtered signal, there were only 2 changes.
Your solution gives me a value of ~3.12 units, is there way to make that more accurate?
I have no idea what your signal is, what your criteria are, or what you want as a result.

Iniciar sesión para comentar.

Categorías

Más información sobre Get Started with Curve Fitting Toolbox en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by