Extrapolation gives wrong values when using interp1

I have two different arrays and want to do interpolation and extrapolation.
X-value = [0 386.5 446.1 526.6 621.5 660.6 711.4 734.9 792.8 810.2 817.9 893.7 1136.8 1317 1420.2 1426.2]
Y-value = [1.225 1.216 1.203 1.194 1.182 1.178 1.171 1.169 1.161 1.160 1.160 1.151 1.122 1.101 1.091 1.091]
The y-value have a linear decrease with larger x-values. I want to do interpolation and extrapolation so I can find the y-value of any x-values. I have tried this code when I want to find the y-value when x- value is f.eks 2000:
interp1(Height,Density,2000,'linear','extrap');
When I try to extrapolation for x-values higher than the largest x-value value which is 1426.2, the y-value is not decreasing. Instead y-value stays nearly flat and even slightly increasing when x-value is increasing. The original curve is clearly decreasing linearly. I have no idea whats the problem. Hope someone can help.

 Respuesta aceptada

Stephen23
Stephen23 el 27 de Mzo. de 2018
Editada: Stephen23 el 28 de Mzo. de 2018
"Extrapolation gives wrong values when using interp1"
interp1 is giving the right value.
The problem is simple: the last two points have the same Y value, so any linear extrapolation will simply continue with that value. Linear interpolation/extrapolation of a new point takes into account (at most) only two data points, which means that the overall downward trend of your data is irrelevant, because the last two points are these:
X = [... 1420.2 1426.2];
Y = [... 1.091 1.091];
So all larger X values will simply return Y = 1.091, no matter how large you make X.
To resolve this you might want to add some smoothing, or do some subsampling, or fit a line (e.g. using least squares), or merge points that are close together, or ... whatever makes sense for your data.

12 comentarios

Torsten
Torsten el 28 de Mzo. de 2018
And how do you explain that the y-value is actually increasing ? Rounding errors ?
Stephen23
Stephen23 el 28 de Mzo. de 2018

That would be a good guess. It is quite possible that the last two values differ by a small amount >=eps, in which case extrapolating to X = one million might produce a detectable change in value (but of course this would likely just be meaningless, for the precision of data). But without the actual data values in a .mat file, the actual new X values used, and the same MATLAB version it is impossible to say what actually occurred or why.

Torsten
Torsten el 28 de Mzo. de 2018
But the data are listed above ...
Stephen23
Stephen23 el 28 de Mzo. de 2018
Editada: Stephen23 el 28 de Mzo. de 2018
"But the data are listed above ..."
How data is displayed is a totally different thing from how data is stored in memory, e.g.:
etc.etc. Note also that that Y value cannot be represented exactly using double, so the displaying is already an imperfect representation of the "actual" data.
The default format is short, which displays "Short, fixed-decimal format with 4 digits after the decimal point." This matches what the question shows, which means that we could have up to 1e-4 difference in the actual stored values shown in the question, before they will be displayed differently, e.g.:
>> A = 1.091049
A = 1.0910
>> B = 1.090951
B = 1.0910
>> A-B
ans = 9.8000e-005
Are A and B the same? No. Do they display the same? Yes.
Of course we don't know where those values were copied from, so this is just speculation. So far the only information I actually have is two vectors of values pasted into this thread. It is likely that those values were generated from some calculations, or perhaps from importing some data: it seems unlikely that the data was input into MATLAB by writing that exact vector shown in the question... unlikely, but certainly possible. However I don't know this, because this information was not explained in the question. Just speculation.
So far Espen Mikkelsen has explained nothing about:
  • the data class,
  • how that data was generated,
  • how that data was manipulated,
  • how/where that data was displayed or printed before copying here,
  • the format, fprintf format string, precision, etc that was used when displaying, printing, etc.
This means that currently no one on this forum has any idea what the "actual" data is (as clearly this is not the same as the displayed data). As I wrote in my reply to your first comment, once we get the exact values in a .mat file (which retains the full numeric precision of the "actual" data), then we could investigate this further.
Torsten
Torsten el 28 de Mzo. de 2018
Editada: Torsten el 28 de Mzo. de 2018
Thanks for your detailed comment. I assumed that the vectors were explicitly set in MATLAB and couldn't find a reason why - in this case - the Y-value at x=2000 shouldn't be 1.091 (assuming a linear extrapolation).
Thank you for helpful answer. I should have seen those values. I hope you dont mind if I ask one more question about the interpolation method. When the new point is inside the boundary of the X vector, is the method the same there, that the method only takes into account two data points? I have now used the curve fitting tool to fit a straight line. Do you know if the interpolation and extrapolation method there also the same?
Stephen23
Stephen23 el 28 de Mzo. de 2018
Editada: Stephen23 el 28 de Mzo. de 2018
"When the new point is inside the boundary of the X vector, is the method the same there, that the method only takes into account two data points?"
This is the only definition of linear interpolation that I know of. The interp1 help describes the linear option as "The interpolated value at a query point is based on linear interpolation of the values at neighboring grid points in each respective dimension." For your vector of data there are (at most) two data points that are neighbors (along the X axis) of the new point (assuming that sample points at data points are not interpolated). If you are not sure how linear interpolation works then there are plenty of online tutorials that will explain this for you.
"Do you know if the interpolation and extrapolation method there also the same?"
Linear interpolation does not change, it has a very clearly specified meaning in mathematics. Read the curve fitting toolbox documentation to know what it does.
Yixuan Ren
Yixuan Ren el 11 de Mayo de 2019
Editada: Yixuan Ren el 11 de Mayo de 2019
!! This is a fantastic insight!
Btw, does MATLAB have any function which uses all the data points to extrapolate?
Hmm. This is the second time I have said this today. Extrapolation is a dangerous thing to do. It is rarely easy, unless you happen to think like Mark Twain.
"In the space of one hundred and seventy six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over a mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oölitic Silurian Period, just a million years ago next November, the Lower Mississippi was upwards of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-pole. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo [Illinois] and New Orleans will have joined their streets together and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact."
(Mark Twin, Life on the Mississippi)
One of my all time favorite quotes about mathematics. But then I love the writings of Mark Twain, and I love mathematics, so I might be biased.
Interp1 does indeed use ONLY the two data points that bound the point in question for interpolation. Extrapolation will use the pair at the appropriate end. (Using the 'linear' option. But you really don't want to use a spline to extrapolate.)
If your goal is to do extrapolation in a way that utilizes the entire set of data, then you need to do something more intelligent. That may involve the curve fitting toolbox where you will need to pose an intelligently chosen model, one that is appropriate for your data. If you do so, then extrpolating that model will indeed use the model, so it effectively uses the entire data set.
Or it might involve a tool like my SLM toolbox, which is able (if used in a rational way) to extrapolate based on an entire set of data.
However, MATLAB does NOT have any tool that can just take any set of data and extrapolate it, based on no intelligent input from the user. If they ever did try to provide such a tool, I would strenuously use any influence I have to tell them NOT to do so either.
Stephen23
Stephen23 el 11 de Mayo de 2019
Editada: Stephen23 el 11 de Mayo de 2019
"does MATLAB have any function which uses all the data points to extrapolate?"
Read about the Curve Fitting toolbox:
Of course this relies on you intelligently selecting a suitable curve to fit, based on the underlying physical system that you are modelling...
Yixuan Ren
Yixuan Ren el 11 de Mayo de 2019
Yes finally I solve it with polyfit. Actually this is an even more straightforward idea, but the assignment misleads me by saying "extrapolate" so I directly rushed into interp1 at the beginning.
Thank you so much for the detailed explanations. (Btw it‘s my first time saysing something in this forum and the speed of responses here shocks me. Thank you again guys. XD)
Polyfit is generally a good choice, as long as you do not use too high an order model for the fit.
Remember that if extrapolation was an easy thing to do, we would be able to predict the weather out years in advance. At best, you can say about the next summer is that it will be sunny some days, rainy on other days.
I'll add what is my favorite quote about mathematics:
"In the space of one hundred and seventy six years the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over a mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oölitic Silurian Period, just a million years ago next November, the Lower Mississippi was upwards of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-pole. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo [Illinois] and New Orleans will have joined their streets together and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact."
(Mark Tain, Life on the Mississippi)

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Statistics and Machine Learning Toolbox en Centro de ayuda y File Exchange.

Preguntada:

el 27 de Mzo. de 2018

Comentada:

el 11 de Mayo de 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by