简体   繁体   中英

How to test a Random Forest regression model for Overfitting?

I'm using RandomForest for a regression model and wanted to see if my model is overfitting. Here is what I did:

I use GridSearchCV for hyperparameter tuning and then create a RandomForestRegressor with those parameters:

RF = RandomForestRegressor(n_estimators=b['n_estimators'], max_depth=b['max_depth'], min_samples_leaf=b['min_samples_leaf'], random_state=0)

Then I fit the model using the train dataset:

model = RF.fit(x_train, y_train.values.ravel())

Then I predict with the test dataset:

y_pred = model.predict(x_test)

Then I did the exact same with x_train instead of x_test:

y_pred = model.predict(x_train)

Here are the results that I achieve:

Test Data:
MAE: 15.11
MAPE: 26.98%

Train Data:
MAE: 6.17
MAPE: 10.97%

As you can see there is a pretty significant difference. Do I have a big problem with overfitting or am I doing something wrong when using x_train to predict?

Formulas for the MAE and MAPE:

MAE:

mae = sklearn.metrics.mean_absolute_error(y_test, y_pred)

MAPE:

def percentage_error(actual, predicted):
   res = np.empty(actual.shape)
   for j in range(actual.shape[0]):
       if actual[j] != 0:
           res[j] = (actual[j] - predicted[j]) / actual[j]
       else:
           res[j] = predicted[j] / np.mean(actual)
   return res

def mean_absolute_percentage_error(y_test, y_pred): 
   return np.mean(np.abs(percentage_error(np.asarray(y_test), np.asarray(y_pred)))) * 100

Source for the MAPE formula: https://stackoverflow.com/a/59033147/10603410

There is not a "If this number x is less than y then we are overfitting", it is you who need to conclude if we are overfitting.

By definition if the test error is "much bigger than the train error", you are overfitting, but this "much bigger" is not defined - if depends on your data and what the model is used for. If your data is really "easy" (ie easy to regress) you would expect a close train/test error. If it is really noisy you could accept a bigger difference

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM