I'm using RandomForest for a regression model and wanted to see if my model is overfitting. Here is what I did:
I use GridSearchCV for hyperparameter tuning and then create a RandomForestRegressor with those parameters:
RF = RandomForestRegressor(n_estimators=b['n_estimators'], max_depth=b['max_depth'], min_samples_leaf=b['min_samples_leaf'], random_state=0)
Then I fit the model using the train dataset:
model = RF.fit(x_train, y_train.values.ravel())
Then I predict with the test dataset:
y_pred = model.predict(x_test)
Then I did the exact same with x_train instead of x_test:
y_pred = model.predict(x_train)
Here are the results that I achieve:
Test Data:
MAE: 15.11
MAPE: 26.98%
Train Data:
MAE: 6.17
MAPE: 10.97%
As you can see there is a pretty significant difference. Do I have a big problem with overfitting or am I doing something wrong when using x_train to predict?
Formulas for the MAE and MAPE:
MAE:
mae = sklearn.metrics.mean_absolute_error(y_test, y_pred)
MAPE:
def percentage_error(actual, predicted):
res = np.empty(actual.shape)
for j in range(actual.shape[0]):
if actual[j] != 0:
res[j] = (actual[j] - predicted[j]) / actual[j]
else:
res[j] = predicted[j] / np.mean(actual)
return res
def mean_absolute_percentage_error(y_test, y_pred):
return np.mean(np.abs(percentage_error(np.asarray(y_test), np.asarray(y_pred)))) * 100
Source for the MAPE formula: https://stackoverflow.com/a/59033147/10603410
There is not a "If this number x
is less than y
then we are overfitting", it is you who need to conclude if we are overfitting.
By definition if the test error is "much bigger than the train error", you are overfitting, but this "much bigger" is not defined - if depends on your data and what the model is used for. If your data is really "easy" (ie easy to regress) you would expect a close train/test error. If it is really noisy you could accept a bigger difference
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.