简体   繁体   English

三次样条曲线拟合

[英]Curve fitting with cubic spline

I am trying to interpolate a cumulated distribution of eg i) number of people to ii) number of owned cars, showing that eg the top 20% of people own much more than 20% of all cars - off course 100% of people own 100% of cars.我试图插入一个累积分布,例如 i) 人数到 ii) 拥有汽车的数量,这表明例如前 20% 的人拥有超过 20% 的汽车 - 当然 100% 的人拥有 100 辆汽车的百分比。 Also I know that there are eg 100mn people and 200mn cars.我也知道有例如 1 亿人和 2 亿辆汽车。

Now coming to my code:现在来看我的代码:

#import libraries (more than required here)
import pandas as pd
from scipy import interpolate
from scipy.interpolate import interp1d
from sympy import symbols, solve, Eq
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
%matplotlib inline
import plotly.express as px
from scipy import interpolate

curve=pd.read_excel('inputs.xlsx',sheet_name='inputdata')

Input data: Curveplot (cumulated people (x) on the left // cumulated cars (y) on the right)输入数据:曲线图(左边累积的人(x)//右边累积的汽车(y))

#Input data in list form (I am not sure how to interpolate from a list for the moment)
cumulatedpeople = [0, 0.453086, 0.772334, 0.950475, 0.978981, 0.999876, 0.999990, 1]
cumulatedcars= [0, 0.016356, 0.126713, 0.410482, 0.554976, 0.950073, 0.984913, 1]

x, y = points[:,0], points[:,1]
interpolation = interp1d(x, y, kind = 'cubic')

number_of_people_mn= 100000000

oneperson = 1 / number_of_people_mn
dataset = pd.DataFrame(range(number_of_people_mn + 1))
dataset.columns = ["nr_of_one_person"]
dataset.drop(dataset.index[:1], inplace=True)

#calculating the position of every single person on the cumulated x-axis (between 0 and 1)
dataset["cumulatedpeople"] = dataset["nr_of_one_person"] / number_of_people_mn

#finding the "cumulatedcars" to the "cumulatedpeople" via interpolation (between 0 and 1)
dataset["cumulatedcars"] = interpolation(dataset["cumulatedpeople"])

plt.plot(dataset["cumulatedpeople"], dataset["cumulatedcars"])
plt.legend(['Cubic interpolation'], loc = 'best')
plt.xlabel('Cumulated people')
plt.ylabel('Cumulated cars')
plt.title("People-to-car cumulated curve")
plt.show()

However when looking at the actual plot, I get the following result which is false: Cubic interpolation然而,当查看实际的 plot 时,我得到以下错误结果:三次插值

In fact, the curve should look almost like the one from a linear interpolation with the exact same input data - however this is not accurate enough for my purpose: Linear interpolation事实上,该曲线应该看起来几乎像具有完全相同输入数据的线性插值的曲线 - 但这对于我的目的来说不够准确: Linear interpolation

Is there any relevant step I am missing out or what would be the best way to get an accurate interpolation from the inputs that almost looks like the one from a linear interpolation?是否有任何我遗漏的相关步骤,或者从几乎看起来像线性插值的输入中获得准确插值的最佳方法是什么?

Short answer: your code is doing the right thing, but the data is unsuitable for cubic interpolation.简短回答:您的代码做的是正确的,但数据不适合三次插值。

Let me explain.让我解释。 Here is your code that I simplified for clarity这是您的代码,为了清楚起见,我简化了它

from scipy.interpolate import interp1d
from matplotlib import pyplot as plt

cumulatedpeople = [0, 0.453086, 0.772334, 0.950475, 0.978981, 0.999876, 0.999990, 1]
cumulatedcars= [0, 0.016356, 0.126713, 0.410482, 0.554976, 0.950073, 0.984913, 1]
interpolation = interp1d(cumulatedpeople, cumulatedcars, kind = 'cubic')

number_of_people_mn= 100#000000
cumppl = np.arange(number_of_people_mn + 1)/number_of_people_mn
cumcars = interpolation(cumppl)
plt.plot(cumppl, cumcars)
plt.plot(cumulatedpeople, cumulatedcars,'o')
plt.show()

note the last couple of lines -- I am plotting, on the same graph, both the interpolated results and the input date.请注意最后几行——我在同一张图上绘制了内插结果和输入日期。 Here is the result这是结果样条1

orange dots are the original data, blue line is cubic interpolation.橙色点是原始数据,蓝线是三次插值。 The interpolator passes through all the points so technically is doing the right thing插值器通过所有的点,所以技术上是做正确的事

Clearly it is not doing what you would want显然它没有做你想做的事

The reason for such strange behavior is mostly at the right end where you have a few x-points that are very close together -- the interpolator produces massive wiggles trying to fit very closely spaced points.这种奇怪行为的原因主要是在右端,您有几个非常靠近的 x 点——插值器会产生巨大的摆动,试图适应非常靠近的点。

If I remove two right-most points from the interpolator:如果我从插值器中删除最右边的两个点:

interpolation = interp1d(cumulatedpeople[:-2], cumulatedcars[:-2], kind = 'cubic')

it looks a bit more reasonable:它看起来更合理一些: 样条2

But still one would argue linear interpolation is better.但仍然有人会认为线性插值更好。 The wiggles on the left end now because the gaps between initial x-poonts are too large现在左端的摆动是因为初始 x 点之间的差距太大

The moral here is that cubic interpolation should really be used only if gaps between x points are roughly the same这里的寓意是只有在 x 点之间的间隙大致相同时才真正使用三次插值

Your best bet here, I think, is to use something like curve_fit我认为你最好的选择是使用类似curve_fit的东西

a related discussion can be found here可以在此处找到相关讨论

specifically monotone interpolation as explained here yields good results on your data.如此处所述,特别是单调插值会对您数据产生良好的结果。 Copying the relevant bits here, you would replace the interpolator with在此处复制相关位,您可以将插值器替换为

from scipy.interpolate import pchip
interpolation = pchip(cumulatedpeople, cumulatedcars)

and get a decent-looking fit:并获得体面的身材: 单调的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM