How to test a Random Forest regression model for Overfitting?

Question

I'm using RandomForest for a regression model and wanted to see if my model is overfitting. Here is what I did:

I use GridSearchCV for hyperparameter tuning and then create a RandomForestRegressor with those parameters:

RF = RandomForestRegressor(n_estimators=b['n_estimators'], max_depth=b['max_depth'], min_samples_leaf=b['min_samples_leaf'], random_state=0)

Then I fit the model using the train dataset:

model = RF.fit(x_train, y_train.values.ravel())

Then I predict with the test dataset:

y_pred = model.predict(x_test)

Then I did the exact same with x_train instead of x_test:

y_pred = model.predict(x_train)

Here are the results that I achieve:

Test Data:
MAE: 15.11
MAPE: 26.98%

Train Data:
MAE: 6.17
MAPE: 10.97%

As you can see there is a pretty significant difference. Do I have a big problem with overfitting or am I doing something wrong when using x_train to predict?

Formulas for the MAE and MAPE:

MAE:

mae = sklearn.metrics.mean_absolute_error(y_test, y_pred)

MAPE:

def percentage_error(actual, predicted):
   res = np.empty(actual.shape)
   for j in range(actual.shape[0]):
       if actual[j] != 0:
           res[j] = (actual[j] - predicted[j]) / actual[j]
       else:
           res[j] = predicted[j] / np.mean(actual)
   return res

def mean_absolute_percentage_error(y_test, y_pred): 
   return np.mean(np.abs(percentage_error(np.asarray(y_test), np.asarray(y_pred)))) * 100

Source for the MAPE formula: https://stackoverflow.com/a/59033147/10603410

Answer 1

There is not a "If this number x is less than y then we are overfitting", it is you who need to conclude if we are overfitting.

By definition if the test error is "much bigger than the train error", you are overfitting, but this "much bigger" is not defined - if depends on your data and what the model is used for. If your data is really "easy" (ie easy to regress) you would expect a close train/test error. If it is really noisy you could accept a bigger difference

How to test a Random Forest regression model for Overfitting?

Question

1 answers

solution1
1 2020-12-17 09:25:59

How to test a Random Forest regression model for Overfitting?

Question

1 answers

solution1 1 2020-12-17 09:25:59

solution1
1 2020-12-17 09:25:59