简体   繁体   English

有没有办法在另一个图上绘制普通最小二乘类型的线?

[英]Is there a way to plot the ordinary least squares type of line on another plot?

I currently have a scatter plot of data points, and I want to draw a line that captures the general pattern of the data.我目前有一个数据点的散点图,我想画一条线来捕捉数据的一般模式。 I believe that this is also known as an ordinary least squares regression method, but I may be wrong as I'm not completely familiar with the literature.我相信这也被称为普通的最小二乘回归方法,但我可能错了,因为我不完全熟悉文献。

For example, if I had a plot like the following:例如,如果我有一个像下面这样的情节:

在此处输入图片说明

I just want a line that goes through the data points, that captures the most general trend.我只想要一条穿过数据点的线,捕捉最普遍的趋势。

I've tried methods like using Scikit-Learn's LinearRegression module, but I'll have to split my data into train and test sets and perform regression.我尝试过使用 Scikit-Learn 的LinearRegression模块等方法,但我必须将数据拆分为训练集和测试集并执行回归。 Is there a way that I can just capture the general trend without having to do this?有没有一种方法可以让我不必这样做就可以捕捉总体趋势?

Thank you.谢谢你。

Here is an example polynomial fitter that does this, if you convert your date format to a numeric type such as "elapsed days" you can directly substitute your data into the example.这是一个执行此操作的示例多项式拟合器,如果您将日期格式转换为数字类型,例如“经过的天数”,您可以直接将您的数据替换到示例中。 Here I use a curved second-order polynomial (quadratic) equation, set at the top of the code, because to my eye the trend of your data appears to have some curvature rather than a straight line.在这里,我使用了一个弯曲的二阶多项式(二次)方程,设置在代码的顶部,因为在我看来,数据的趋势似乎有一些曲率而不是直线。

阴谋

import numpy, matplotlib
import matplotlib.pyplot as plt

xData = numpy.array([1.1, 2.2, 3.3, 4.4, 5.0, 6.6, 7.7, 0.0])
yData = numpy.array([1.1, 20.2, 30.3, 40.4, 50.0, 60.6, 70.7, 0.1])

polynomialOrder = 2 # example quadratic

# curve fit the test data
fittedParameters = numpy.polyfit(xData, yData, polynomialOrder)
print('Fitted Parameters:', fittedParameters)

modelPredictions = numpy.polyval(fittedParameters, xData)
absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = numpy.polyval(fittedParameters, xModel)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM