简体   繁体   English

如何使用线性回归预测值?

[英]How to predict a value with linear regression?

I want to predict the behavior of my data in the future.我想预测我的数据在未来的行为。 The value of my data x and y is about 1000 values.我的数据 x 和 y 的值大约是 1000 个值。 I want to predict the value y[1001].我想预测值 y[1001]。 This is my example.这是我的例子。

from numpy.random import randn
from numpy.random import seed
from numpy import sqrt
import numpy
from numpy import sum as arraysum
from scipy.stats import linregress
from matplotlib import pyplot

seed(1)
x = 20 * randn(1000) + 100
print(numpy.size(x))
y = x + (10 * randn(1000) + 50)
print(numpy.size(y))
# fit linear regression model
b1, b0, r_value, p_value, std_err = linregress(x, y)
# make predictions
yhat = b0 + b1 * x
# define new input, expected value and prediction
x_in = x[1001]
y_out = y[1001]
yhat_out = yhat[1001]
# estimate stdev of yhat
sum_errs = arraysum((y - yhat)**2)
stdev = sqrt(1/(len(y)-2) * sum_errs)
# calculate prediction interval
interval = 1.96 * stdev
print('Prediction Interval: %.3f' % interval)
lower, upper = y_out - interval, y_out + interval
print('95%% likelihood that the true value is between %.3f and %.3f' % (lower, upper))
print('True value: %.3f' % yhat_out)
# plot dataset and prediction with interval
pyplot.scatter(x, y)
pyplot.plot(x, yhat, color='red')
pyplot.errorbar(x_in, yhat_out, yerr=interval, color='black', fmt='o')
pyplot.show()

When I try that, it gives me this error.当我尝试这样做时,它给了我这个错误。

     x_in = x[1001]
IndexError: index 1001 is out of bounds for axis 0 with size 1000

My goal is to predict the behavior of my data in the future and evalute it by plotting its error bars too.我的目标是预测我的数据在未来的行为,并通过绘制误差线来评估它。 I see this example how do you create a linear regression forecast on time series data in python but I don't understand how to apply it to my data.我看到这个例子你如何在 python 中对时间序列数据创建线性回归预测,但我不明白如何将它应用于我的数据。 I found that it is possible to use ARIMA model.我发现可以使用 ARIMA 模型。 Please How could I do that?请问我怎么能这样做?

x = 20 * randn(1000) + 100

^ Here you are creating input vector X with only 1000 values. ^ 这里您创建的输入向量 X 只有 1000 个值。

y = x + (10 * randn(1000) + 50)

^ and here you creating output vector y with again only 1000 values. ^ 在这里您再次创建只有 1000 个值的输出向量 y。

So when you do x_in = x[1001] , you are referring to an element that is not present in the input vector as it contains only 1000 elements.因此,当您执行x_in = x[1001] ,您指的是输入向量中不存在的元素,因为它仅包含 1000 个元素。

A quick fix would be一个快速的解决办法是

seed(1)
x = 20 * randn(1001) + 100
print(numpy.size(x))
y = x + (10 * randn(1001) + 50)
print(numpy.size(y))
# fit linear regression model
b1, b0, r_value, p_value, std_err = linregress(x[:1000], y[:1000])
# make predictions
yhat = b0 + b1 * x
# define new input, expected value and prediction
x_in = x[1000]
y_out = y[1000]

Here is code for a graphing ploynomial fitter to fit a first order polynomial using numpy.polyfit() to perform the fit and mu,py.polyval() to predict values.这是一个图形多项式拟合器的代码,用于拟合一阶多项式,使用 numpy.polyfit() 执行拟合和 mu,py.polyval() 预测值。 You can experiment with different polynomial orders by changing the line "polynomialOrder = 1" at the top of the code.您可以通过更改代码顶部的“polynomialOrder = 1”行来试验不同的多项式阶数。

import numpy, matplotlib
import matplotlib.pyplot as plt

xData = numpy.array([1.1, 2.2, 3.3, 4.4, 5.0, 6.6, 7.7, 0.0])
yData = numpy.array([1.1, 20.2, 30.3, 40.4, 50.0, 60.6, 70.7, 0.1])

polynomialOrder = 1 # example straight line

# curve fit the test data
fittedParameters = numpy.polyfit(xData, yData, polynomialOrder)
print('Fitted Parameters:', fittedParameters)

modelPredictions = numpy.polyval(fittedParameters, xData)
absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = numpy.polyval(fittedParameters, xModel)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM