简体   繁体   English

绘制线性回归时如何确定正确的形状?

[英]How to determine the correct shape when plotting linear regression?

I am trying to visualize my linear regression model and unfortunately I can't quite figure out how to manage the data to plot the regression results correctly. 我试图可视化我的线性回归模型,但不幸的是,我不太清楚如何管理数据以正确绘制回归结果。 Below are the steps I took to perform the linear regression model, how the data looks, and the errors I'm getting. 以下是我执行线性回归模型所采取的步骤,数据的外观以及所得到的错误。

X=sale[['Dec-2018','Nov-2018', 'Oct-2018','Sep-2018','Aug-2018','Jul-2018']]
y=sale[['CLV']]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state=0)

from sklearn.linear_model import LinearRegression
linreg = LinearRegression()
linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test)

print (X)
print (y)
X.iloc[:,:] = labelencoder_X.fit_transform(X.iloc[:,:])
y.iloc[:,:1] = labelencoder_y.fit_transform(y.iloc[:,:1])
plt.scatter(X_test, y_test, color='black')
plt.plot(X_train, y_pred, color = 'green', linewidth=3)
plt.title('CLTV (Training set)')
plt.xlabel('Time')
plt.ylabel('CLV')
plt.show()

Below is the state of data and the error I'm getting: 以下是数据状态和出现的错误:

   month_year  Dec-2018  Nov-2018  Oct-2018  Sep-2018  Aug-2018  Jul-2018
0               0.00      0.00      0.00      0.00      0.00      0.00
1               0.00      0.00      0.00      0.00      0.00      0.00
2               0.00    286.40      0.00    825.92      0.00    902.09
3               0.00      0.00      0.00    521.50      0.00      0.00
4               0.00   6354.88  16471.77   2941.72  21706.44   2796.36
5               0.00      0.00      0.00    147.70      0.00      0.00
6               0.00      0.00      0.00      0.00      0.00      0.00
7               0.00    601.44    678.76      0.00    608.76   1064.08
8               0.00      0.00      0.00    519.89      0.00      0.00
9             438.50    312.73    675.38      0.00    301.70      0.00
10            998.61   9053.83   2149.30   5999.50    654.37   1070.59
11            763.06    572.59      0.00      0.00   1724.95      0.00
12            210.35      0.00    343.76    217.77      0.00      0.00
13              0.00      0.00      0.00      0.00      0.00      0.00
14              0.00      0.00      0.00    918.98      0.00      0.00
15              0.00      0.00      0.00    535.50    229.50      0.00
16              0.00      0.00    392.08      0.00      0.00      0.00
17            142.60    279.50      0.00    234.00      0.00      0.00
18            111.45    100.95    217.75      0.00      0.00      0.00
19            327.40      0.00    245.80     77.31    338.20      0.00
20              0.00      0.00      0.00      0.00      0.00      0.00
21              0.00    400.32      0.00   1210.32      0.00   2915.92
22              0.00      0.00      0.00      0.00      0.00      0.00
23              0.00    115.23      0.00    267.80      0.00      0.00
24              0.00      0.00      0.00      0.00      0.00    417.38
25              0.00      0.00      0.00      0.00      0.00      0.00
26              0.00      0.00    497.83      0.00      0.00    446.09
27              0.00      0.00      0.00      0.00      0.00      0.00
28              0.00    279.86      0.00      0.00      0.00      0.00
29            752.39   1070.14    387.80    692.24    330.44    653.00
...              ...       ...       ...       ...       ...       ...
3898            0.00    117.54    311.63    438.14    537.95    165.00
3899            0.00      0.00      0.00   1538.41      0.00      0.00
3900          874.45      0.00      0.00      0.00      0.00    361.48
3901            0.00    363.20      0.00      0.00      0.00      0.00
3902            0.00      0.00      0.00      0.00    297.06      0.00
3903            0.00     95.34      0.00      0.00      0.00      0.00
3904            0.00      0.00      0.00      0.00      0.00      0.00
3905            0.00      0.00      0.00   4314.72      0.00      0.00
3906            0.00      0.00    448.37      0.00      0.00      0.00
3907            0.00      0.00      0.00    103.30      0.00      0.00
3908            0.00      0.00    774.76      0.00    627.27      0.00
3909            0.00   1070.40      0.00    891.90      0.00      0.00
3910            0.00      0.00      0.00      0.00      0.00      0.00
3911            0.00      0.00     99.44    224.80      0.00      0.00
3912            0.00      0.00      0.00      0.00      0.00    149.48
3913            0.00    399.68      0.00      0.00      0.00    503.80
3914            0.00      0.00      0.00    312.96      0.00    488.55
3915            0.00      0.00      0.00      0.00      0.00     25.50
3916            0.00      0.00      0.00      0.00      0.00      0.00
3917            0.00    171.20      0.00      0.00      0.00      0.00
3918          367.88      0.00    604.25      0.00    372.25    753.66
3919            0.00      0.00      0.00      0.00      0.00      0.00
3920            0.00      0.00    329.61      0.00      0.00      0.00
3921            0.00      0.00    110.38      0.00      0.00      0.00
3922            0.00      0.00      0.00    173.90      0.00      0.00
3923            0.00      0.00      0.00      0.00      0.00      0.00
3924            0.00      0.00      0.00      0.00      0.00      0.00
3925           77.84      0.00      0.00      0.00     98.76      0.00
3926          208.00    637.71    112.99    134.90      0.00    139.89
3927            0.00      0.00   1072.00      0.00      0.00      0.00

[3928 rows x 6 columns]
month_year        CLV
0                0.00
1              401.90
2             2780.66
3             1150.80
4           121869.86
5              386.20
6             1760.96
7             5371.07
8              792.94
9             4196.01
10           29748.44
11            3822.90
12             942.34
13              92.72
14             918.98
15            1759.50
16             392.08
17            1468.12
18             430.15
19             988.71
20             253.05
21            6748.40
22             215.05
23             383.03
24             417.38
25             312.38
26            2595.24
27             134.10
28             670.65
29            5578.04
...               ...
3898          2058.09
3899          2232.49
3900          2527.10
3901           363.20
3902           793.52
3903            95.34
3904           342.92
3905          4314.72
3906           518.27
3907           103.30
3908          2274.03
3909          2338.60
3910          2128.57
3911           324.24
3912           149.48
3913           903.48
3914           801.51
3915            25.50
3916           138.90
3917           244.90
3918          2098.04
3919             0.00
3920           329.61
3921           110.38
3922           173.90
3923           180.60
3924            80.82
3925           176.60
3926          1929.93
3927          1837.28

[3928 rows x 1 columns]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-197-44d25a827a36> in <module>
      2 print (X)
      3 print (y)
----> 4 X.iloc[:,:] = labelencoder_X.fit_transform(X.iloc[:,:])
      5 y.iloc[:,:1] = labelencoder_y.fit_transform(y.iloc[:,:1])
      6 plt.scatter(X_test, y_test, color='black')

/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/label.py in fit_transform(self, y)
    233         y : array-like of shape [n_samples]
    234         """
--> 235         y = column_or_1d(y, warn=True)
    236         self.classes_, y = _encode(y, encode=True)
    237         return y

/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in column_or_1d(y, warn)
    795         return np.ravel(y)
    796 
--> 797     raise ValueError("bad input shape {0}".format(shape))
    798 
    799 

ValueError: bad input shape (3928, 6)

I've tried multiple slicing scenarios of data but it's not helping. 我尝试了数据的多个切片方案,但没有帮助。 Probably there's something to do with dimensions and it's exactly where i need help. 可能与尺寸有关,这正是我需要帮助的地方。

labelencoder_X.fit_transform function is probably returning a new object (numpy array or another dataframe). labelencoder_X.fit_transform函数可能返回一个新对象(numpy数组或另一个数据框)。 The error is happening once you are assigning the return object to the same input using iloc. 使用iloc将返回对象分配给同一输入后,就会发生错误。

You can try: 你可以试试:

X_transf = labelencoder_X.fit_transform(X)
y_transf = labelencoder_y.fit_transform(y)

or, only the name of the variable without iloc. 或者,仅包含变量的名称而不包含iloc。

Hope this can be of any help. 希望这会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM