简体   繁体   English

散点图的最佳拟合线

[英]Best fitting line for a scatter plot

Is there any way to find the best fitting line for a scatter plot if I don't know the relationship between 2 axes(else I could have used scipy.optimize).My scatter plot looks something like this 如果我不知道2轴之间的关系(或者我可以使用scipy.optimize),有没有办法找到散点图的最佳拟合线。我的散点图看起来像这样

散点图

I would like to have a line like this 我想有这样一条线 预期结果 and i need to get the points of the best fitting line for my further calculation 我需要得到最佳拟合线的点数,以便我进一步计算

for j in lat :
l=94*j
i=l-92
for lines in itertools.islice(input_file, i, l):
    lines=lines.split()
    p.append(float(Decimal(lines[0])))
    vmr.append(float(Decimal(lines[3])))
    plt.scatter(vmr, p)

You can use LOWESS (Locally Weighted Scatterplot Smoothing) , a non-parametric regression method. 您可以使用LOWESS(局部加权散点图平滑) ,一种非参数回归方法。

Statsmodels has an implementation here that you can use to fit your own smoother. Statsmodels 在这里有一个实现,你可以使用它来适应你自己的更顺畅。

See this StackOverflow question on visualizing nonlinear relationships in scatter plots for an example using the Statsmodels implementation. 有关使用Statsmodels实现的示例,可以在散点图中查看非线性关系的StackOverflow问题

You could also use the implementation in the Seaborn visuzalization library's regplot() function with the keyword argument lowess=True . 您还可以在Seaborn visuzalization库的regplot()函数中使用关键字参数lowess=True See the Seaborn documentation for details. 有关详细信息,请参阅Seaborn文档

The following code is an example using Seaborn and the data from the StackOverflow question above: 以下代码是使用Seaborn和上面StackOverflow问题中的数据的示例:

import numpy as np
import seaborn as sns
sns.set_style("white")

x = np.arange(0,10,0.01)
ytrue = np.exp(-x/5.0) + 2*np.sin(x/3.0)

# add random errors with a normal distribution                      
y = ytrue + np.random.normal(size=len(x))

sns.regplot(x, y, lowess=True, color="black", 
            line_kws={"color":"magenta", "linewidth":5})

结果图

This probably isn't a matplotlib question, but I think you can do this kind of thing with pandas, using a rolling median. 这可能不是一个matplotlib问题,但我认为你可以用熊猫做这种事情,使用滚动中位数。

smoothedData = dataSeries.rolling(10, center = True).median()

Actually you can do a rolling median with anything, but pandas has a built in function. 实际上你可以用任何东西做滚动中位数,但是熊猫有内置功能。 Numpy may too. Numpy也可能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM