我的训练/测试 model 返回错误并且是训练/测试 model 和正常线性回归 model 两个单独的模型？

Question

I recently attending a class where the instructor was teaching us how to create a linear regression model using Python.我最近参加了一个 class，讲师正在教我们如何使用 Python 创建线性回归 model。 Here is my linear regression model:这是我的线性回归 model：

import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats
import numpy as np
from sklearn.metrics import r2_score

#Define the path for the file
path=r"C:\Users\H\Desktop\Files\Data.xlsx"

#Read the file into a dataframe ensuring to group by weeks
df=pd.read_excel(path, sheet_name = 0)
df=df.groupby(['Week']).sum()
df = df.reset_index()

#Define x and y
x=df['Week']
y=df['Payment Amount Total']

#Draw the scatter plot
plt.scatter(x, y)
plt.show()

#Now we draw the line of linear regression

#First we want to look for these values
slope, intercept, r, p, std_err = stats.linregress(x, y)

#We then create a function 
def myfunc(x):
#Below is y = mx + c 
 return slope * x + intercept

#Run each value of the x array through the function. This will result in a new array with new values for the y-axis:
mymodel = list(map(myfunc, x))

#We plot the scatter plot and line
plt.scatter(x, y)
plt.plot(x, mymodel)
plt.show()

#We print the value of r
print(r)

#We predict what the cost will be in week 23
print(myfunc(23))

The instructor said we now must use the train/test model to determine how accurate the model above is.教练说我们现在必须使用训练/测试 model 来确定上面的 model 有多准确。 This confused me a little as I understood it to mean we will further refine the model above.这让我有点困惑，因为我理解这意味着我们将进一步完善上面的 model。 Or, does it simply mean we will use:或者，它是否仅仅意味着我们将使用：

a normal linear regression model正态线性回归 model
a train/test model火车/测试 model

and compare the r values the two different models yield as well as the predicted values they yield?.并比较两个不同模型产生的 r 值以及它们产生的预测值？ Is the train/test model considered a regression model?训练/测试 model 是否被视为回归 model？

I tried to create the train/test model but I'm not sure if it's correct (the packages were imported from the above example).我尝试创建训练/测试 model 但我不确定它是否正确（包是从上面的示例中导入的）。 When I run the train/test code I get the following error:当我运行训练/测试代码时，出现以下错误：

ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required.

Here is the full code:这是完整的代码：

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

#I display the training set:
plt.scatter(train_x, train_y)
plt.show()

#I display the testing set:
plt.scatter(test_x, test_y)
plt.show()

mymodel = np.poly1d(np.polyfit(train_x, train_y, 4))

myline = np.linspace(0, 6, 100)

plt.scatter(train_x, train_y)
plt.plot(myline, mymodel(myline))
plt.show()

#Let's look at how well my training data fit in a polynomial regression?
mymodel = np.poly1d(np.polyfit(train_x, train_y, 4))
r2 = r2_score(train_y, mymodel(train_x))
print(r2)

#Now we want to test the model with the testing data as well
mymodel = np.poly1d(np.polyfit(train_x, train_y, 4))
r2 = r2_score(test_y, mymodel(test_x))
print(r2)

#Now we can use this model to predict new values:
    
#We predict what the total amount would be on the 23rd week:
print(mymodel(23))

Answer 1

You better split to train and test using sklearn method:您最好使用sklearn方法进行训练和测试：

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Where X is your features dataframe and y is the column of your labels.其中X是您的特征 dataframe 和y是您的标签列。 0.2 stands for 80% train and 20% test. 0.2 代表 80% 训练和 20% 测试。

BTW - the error you are describing could be because you dataframe has only 80 rows, leaving x[80:] empty顺便说一句-您描述的错误可能是因为您 dataframe 只有 80 行，使x[80:]为空

我的训练/测试 model 返回错误并且是训练/测试 model 和正常线性回归 model 两个单独的模型？

问题描述

1 个解决方案

解决方案1
0 2022-09-18 10:55:23

我的训练/测试 model 返回错误并且是训练/测试 model 和正常线性回归 model 两个单独的模型？

问题描述

1 个解决方案

解决方案1 0 2022-09-18 10:55:23

解决方案1
0 2022-09-18 10:55:23