Python 中的随机森林 [r2_score 中的错误]

Question

I am new to Machine Learning and to Python.我是机器学习和 Python 的新手。 I am trying to build a Random Forest model in order to predict cement strength.我正在尝试构建一个随机森林 model 以预测水泥强度。 There are two .csv files: train_data.csv and test_data.csv .有两个.csv文件： train_data.csv和test_data.csv 。

This is what I have done.这就是我所做的。 I am trying to predict the r2_score here.我试图在这里预测r2_score 。

df=pd.read_csv("train_data(1).csv")
X=df.drop('strength',axis=1)
y=df['strength']
model=RandomForestRegressor()
model.fit(X,y)
X_test=pd.read_csv("test_data.csv")
y_pred=model.predict(X_test)
acc_R=metrics.r2_score(y,y_pred)
acc_R

The problem here is that the shape of y and y_pred is different.这里的问题是y和y_pred的形状不同。 So I get this error:所以我得到这个错误：

ValueError: Found input variables with inconsistent numbers of samples: [721, 309]

How do I correct this?我该如何纠正？ Can someone explain to me what I am doing wrong?有人可以向我解释我做错了什么吗？

Answer 1

You need to compare y_pred with y_test .您需要将y_pred与y_test进行比较。 Not y which you used to train the model:不是你用来训练y的：

acc_R=metrics.r2_score(y_test,y_pred)

There should be another list of labels for the y_test in test_data.csv. test_data.csv 中应该有另一个 y_test 的标签列表。

Try the following:尝试以下操作：

df=pd.read_csv("train_data(1).csv")
X=df.drop('strength',axis=1)
y=df['strength']
model=RandomForestRegressor()
model.fit(X,y)
df1=pd.read_csv("test_data.csv") # we read the csv data from test
X_test=df1.drop('strength',axis=1) # get the fields that we will predict
y_test=df1['strength'] # get the correct labels for X_test
y_pred=model.predict(X_test) # get the predicted results
acc_R=metrics.r2_score(y_test,y_pred) # compare
acc_R

Answer 2

df_train = pd.read_csv("train_data(1).csv")
X_train = df.drop('strength',axis=1)
y_train = df['strength']
model=RandomForestRegressor()
model.fit(X_train,y_train)
df_test = pd.read_csv("test_data.csv")
X_test = df.drop('strength',axis=1) # if your test data consists of 'strength' 
y_test = df['strength'] # if your test data consists of 'strength' 
y_pred = model.predict(X_test)
acc_R = metrics.r2_score(y_test,y_pred)
acc_R

Python 中的随机森林 [r2_score 中的错误]

问题描述

2 个解决方案

解决方案1
0 2020-06-03 20:41:05

解决方案2
0 2020-06-03 21:03:35

Python 中的随机森林 [r2_score 中的错误]

问题描述

2 个解决方案

解决方案1 0 2020-06-03 20:41:05

解决方案2 0 2020-06-03 21:03:35

解决方案1
0 2020-06-03 20:41:05

解决方案2
0 2020-06-03 21:03:35