简体   繁体   English

使用python进行线性回归的简单预测

[英]Simple prediction using linear regression with python

data2 = pd.DataFrame(data1['kwh'])
data2
                          kwh
date    
2012-04-12 14:56:50     1.256400
2012-04-12 15:11:55     1.430750
2012-04-12 15:27:01     1.369910
2012-04-12 15:42:06     1.359350
2012-04-12 15:57:10     1.305680
2012-04-12 16:12:10     1.287750
2012-04-12 16:27:14     1.245970
2012-04-12 16:42:19     1.282280
2012-04-12 16:57:24     1.365710
2012-04-12 17:12:28     1.320130
2012-04-12 17:27:33     1.354890
2012-04-12 17:42:37     1.343680
2012-04-12 17:57:41     1.314220
2012-04-12 18:12:44     1.311970
2012-04-12 18:27:46     1.338980
2012-04-12 18:42:51     1.357370
2012-04-12 18:57:54     1.328700
2012-04-12 19:12:58     1.308200
2012-04-12 19:28:01     1.341770
2012-04-12 19:43:04     1.278350
2012-04-12 19:58:07     1.253170
2012-04-12 20:13:10     1.420670
2012-04-12 20:28:15     1.292740
2012-04-12 20:43:15     1.322840
2012-04-12 20:58:18     1.247410
2012-04-12 21:13:20     0.568352
2012-04-12 21:28:22     0.317865
2012-04-12 21:43:24     0.233603
2012-04-12 21:58:27     0.229524
2012-04-12 22:13:29     0.236929
2012-04-12 22:28:34     0.233806
2012-04-12 22:43:38     0.235618
2012-04-12 22:58:43     0.229858
2012-04-12 23:13:43     0.235132
2012-04-12 23:28:46     0.231863
2012-04-12 23:43:55     0.237794
2012-04-12 23:59:00     0.229634
2012-04-13 00:14:02     0.234484
2012-04-13 00:29:05     0.234189
2012-04-13 00:44:09     0.237213
2012-04-13 00:59:09     0.230483
2012-04-13 01:14:10     0.234982
2012-04-13 01:29:11     0.237121
2012-04-13 01:44:16     0.230910
2012-04-13 01:59:22     0.238406
2012-04-13 02:14:21     0.250530
2012-04-13 02:29:24     0.283575
2012-04-13 02:44:24     0.302299
2012-04-13 02:59:25     0.322093
2012-04-13 03:14:30     0.327600
2012-04-13 03:29:31     0.324368
2012-04-13 03:44:31     0.301869
2012-04-13 03:59:42     0.322019
2012-04-13 04:14:43     0.325328
2012-04-13 04:29:43     0.306727
2012-04-13 04:44:46     0.299012
2012-04-13 04:59:47     0.303288
2012-04-13 05:14:48     0.326205
2012-04-13 05:29:49     0.344230
2012-04-13 05:44:50     0.353484
...

65701 rows × 1 columns

I have this dataframe with this index and 1 column.I want to do simple prediction using linear regression with sklearn.I'm very confused and I don't know how to set X and y(I want the x values to be the time and y values kwh...).I'm new to Python so every help is valuable.Thank you.我有这个带有索引和 1 列的数据框。我想使用线性回归和 sklearn 进行简单的预测。我很困惑,我不知道如何设置 X 和 y(我希望 x 值是时间和 y 值 kwh...)。我是 Python 新手,所以每一个帮助都很有价值。谢谢。

The first thing you have to do is split your data into two arrays, X and y.您必须做的第一件事是将数据拆分为两个数组 X 和 y。 Each element of X will be a date, and the corresponding element of y will be the associated kwh. X 的每个元素将是一个日期,而 y 的相应元素将是关联的千瓦时。

Once you have that, you will want to use sklearn.linear_model.LinearRegression to do the regression.完成后,您将需要使用 sklearn.linear_model.LinearRegression 进行回归。 The documentation is here .文档在这里

As for every sklearn model, there is two step.对于每个 sklearn 模型,有两个步骤。 First you must fit your data.首先,您必须拟合您的数据。 Then, put the dates of which you want to predict the kwh in another array, X_predict, and predict the kwh using the predict method.然后,将要预测千瓦时的日期放入另一个数组 X_predict 中,并使用 predict 方法预测千瓦时。

from sklearn.linear_model import LinearRegression

X = []  # put your dates in here
y = []  # put your kwh in here

model = LinearRegression()
model.fit(X, y)

X_predict = []  # put the dates of which you want to predict kwh here
y_predict = model.predict(X_predict)

Predict() function takes 2 dimensional array as arguments. Predict() 函数以二维数组作为参数。 So, If u want to predict the value for simple linear regression, then you have to issue the prediction value within 2 dimentional array like,所以,如果你想预测简单线性回归的值,那么你必须在二维数组中发布预测值,如,

model.predict([[2012-04-13 05:55:30]]); model.predict([[2012-04-13 05:55:30]]);

If it is a multiple linear regression then,如果是多元线性回归,则

model.predict([[2012-04-13 05:44:50,0.327433]]) model.predict([[2012-04-13 05:44:50,0.327433]])

Liner Regression:线性回归:

import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  
data=pd.read_csv('Salary_Data.csv')  
X=data.iloc[:,:-1].values  
y=data.iloc[:,1].values  

#split dataset in train and testing set   
from sklearn.cross_validation import train_test_split  
X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=10,random_state=0)  

from sklearn.linear_model import LinearRegression  
regressor=LinearRegression()  
regressor.fit(X_train,Y_train)  
y_pre=regressor.predict(X_test)  

You can have a look at my code on Github where I am predicting temperature using the chirps of an insect cricket with Simple Linear Regression Model.您可以查看我在 Github 上的代码,我在其中使用具有简单线性回归模型的昆虫蟋蟀的啁啾来预测温度。 I have explained the code with comments我已经用注释解释了代码

#Import the libraries required
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#Importing the excel data 
dataset = pd.read_excel('D:\MachineLearing\Machine Learning A-Z Template Folder\Part 2 - Regression\Section 4 - Simple Linear Regression\CricketChirpsVs.Temperature.xls')

x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

#Split the data into train and test dataset
from sklearn.cross_validation import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=1/3,random_state=42)

#Fitting Simple Linear regression data model to train data set
from sklearn.linear_model import LinearRegression
regressorObject=LinearRegression()
regressorObject.fit(x_train,y_train)

#predict the test set
y_pred_test_data=regressorObject.predict(x_test)


# Visualising the Training set results in a scatter plot
plt.scatter(x_train, y_train, color = 'red')
plt.plot(x_train, regressorObject.predict(x_train), color = 'blue')
plt.title('Cricket Chirps vs Temperature (Training set)')
plt.xlabel('Cricket Chirps (chirps/sec for the striped ground cricket) ')
plt.ylabel('Temperature (in degrees Fahrenheit)')
plt.show()

# Visualising the test set results in a scatter plot
plt.scatter(x_test, y_test, color = 'red')
plt.plot(x_train, regressorObject.predict(x_train), color = 'blue')
plt.title('Cricket Chirps vs Temperature (Test set)')
plt.xlabel('Cricket Chirps (chirps/sec for the striped ground cricket) ')
plt.ylabel('Temperature (in degrees Fahrenheit)')
plt.show()

For more information please visit欲了解更多信息,请访问

https://github.com/wins999/Cricket_Chirps_Vs_Temprature--Simple-Linear-Regression-in-Python- https://github.com/wins999/Cricket_Chirps_Vs_Temprature--Simple-Linear-Regression-in-Python-

After splitting the dataset into the Training set and Test set将数据集拆分为训练集和测试集后

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state =0)

Training your Simple Linear Regression model on the Training set在训练集上训练简单线性回归模型

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

Predicting the Test set results预测测试集结果

y_predict = regressor.predict(X_test)

You should implement following code.您应该实现以下代码。

import pandas as pd
from sklearn.linear_model import LinearRegression # to build linear regression model
from sklearn.cross_validation import train_test_split # to split dataset

data2 = pd.DataFrame(data1['kwh'])
data2 = data2.reset_index() # will create new index (0 to 65700) so date column wont be an index now.
X = data2.iloc[:,0]   # date column
y = data2.iloc[:,-1]  # kwh column

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, train_size=0.80, random_state=20)  

linearModel = LinearRegression()
linearModel.fit(Xtrain, ytrain)
ypred = model.predict(Xtest)

here ypred will give you probabilities.这里 ypred 会给你概率。

Just in case someone is looking for a solution without sklearn以防万一有人正在寻找没有 sklearn 的解决方案

import numpy as np
import pandas as pd

def variance(values, mean):
    return sum([(val-mean)**2 for val in values])

def covariance(x, mean_x, y , mean_y):
    covariance = 0.0
    for r in range(len(x)):
        covariance = covariance + (x[r] - mean_x) * (y[r] - mean_y)
    return covariance

def get_coef(df):
    mean_x = sum(df['x']) / float(len(df['x']))
    mean_y = sum(df['y']) / float(len(df['y']))
    variance_x = variance(df['x'], mean_x)
    #variance_y = variance(df['y'], mean_y)
    covariance_x_y = covariance(df['x'],mean_x,df['y'],mean_y)
    m = covariance_x_y / variance_x
    c = mean_y - m * mean_x
    return m,c

def get_y(x,m,c):
    return m*x+c

inspired from https://github.com/dhirajk100/Linear-Regression-from-Scratch-in-Python/blob/master/Linear%20Regression%20%20from%20Scratch%20Without%20Sklearn.ipynb灵感来自https://github.com/dhirajk100/Linear-Regression-from-Scratch-in-Python/blob/master/Linear%20Regression%20%20from%20Scratch%20Without%20Sklearn.ipynb

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM