训练测试数据集回归结果

Question

My problem is that I am applying a simple linear regression on my data: when I split the data to train and test data I don't find significant model when bad p-value and r squared and adjusted r squared results while there is good results in train data. 我的问题是我在数据上应用了简单的线性回归：当我将数据拆分以训练和测试数据时，当p值不佳且r平方并调整了r平方的结果却有良好的结果时，我找不到有效的模型在火车数据中。 Here's the code for more explanations : 这是更多解释的代码：

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from scipy import stats
data = pd.read_excel ("C:\\Users\\AchourAh\\Desktop\\PL14_IPC_03_09_2018_SP_Level.xlsx",'Sheet1') #Import Excel file
data1 = data.fillna(0) #Replace null values of the whole dataset with 0
print(data1)
X = data1.iloc[0:len(data1),5].values.reshape(-1, 1) #Extract the column of the COPCOR SP we are going to check its impact
Y = data1.iloc[0:len(data1),6].values.reshape(-1, 1) #Extract the column of the PAUS SP
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size =0.3,  random_state = 0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, Y_train)
plt.scatter(X_train, Y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('SP00114585')
plt.xlabel('COP COR Quantity')
plt.ylabel('PAUS Quantity')
plt.show()
plt.scatter(X_test, Y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('SP00114585')
plt.xlabel('COP COR Quantity')
plt.ylabel('PAUS Quantity')
plt.show()
X2 = sm.add_constant(X_train)
est = sm.OLS(Y_train, X2)
est2 = est.fit()
print(est2.summary())
X3 = sm.add_constant(X_test)
est3 = sm.OLS(Y_test, X3)
est4 = est3.fit()
print(est4.summary())

At the end, when trying to display statistical results, I always find good results in train data but not in test data. 最后，当尝试显示统计结果时，我总是会在火车数据中找到好的结果，而在测试数据中却找不到。 Probably something wrong in my code. 我的代码可能出了点问题。 To notice I am a beginner with python 注意我是python的初学者

Answer 1

Try running this model a few times, without specifying random_state in train_test_split or changing the test_size parameter. 尝试运行这个模型了几次，不指定random_state在train_test_split或改变test_size参数。

Ie 即

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size =0.2)

As of now, every time you run the model, you do the same split of data, so it is possible that you overfit the model just because of the split. 到目前为止，每次运行模型时，您都会进行相同的数据拆分，因此可能由于拆分而过度拟合了模型。

训练测试数据集回归结果

问题描述

1 个解决方案

解决方案1
0 2018-09-26 12:06:59

训练测试数据集回归结果

问题描述

1 个解决方案

解决方案1 0 2018-09-26 12:06:59

解决方案1
0 2018-09-26 12:06:59