简体   繁体   中英

How to achieve regression model without underfitting or overfitting

I have my university project and i'm given a dataset which almost all features have a very weak (only 1 feature has moderate correlation with the target) correlation with the target. It's distribution is not normal too. I already tried to apply simple model linear regression it caused underfitting , then i applied simple random forest regressor but it caused overfitting but when i applied random forest regressor with optimization with randomsearchcv it took time so long. Is there any way to get decent model with not-so-good dataset without underfitting or overfitting? or it's just not possible at all?

Well, to be blunt, if you could fit a model without underfitting or overfitting you would have solved AI completely.

Some suggestions, though:

Overfitting on random forests

  • Personally, I'd try to hack this route since you mention that your data is not strongly correlated. It's typically easier to fix overfitting than underfitting so that helps, too.

  • Try looking at your tree outputs. If you are using python , sci-kit learn 's export_graphviz can be helpful.

  • Try reducing the maximum depth of the trees.

  • Try increasing the maximum number of a samples a tree must have in order to split (or similarly, the minimum number of samples a leaf should have).

  • Try increasing the number of trees in the RF.

Underfitting on linear regression

  • Add more parameters. If you have variables a, b, ... etc. adding their polynomial features, ie a^2, a^3 ... b^2, b^3 ... etc. may help. If you add enough polynomial features you should be able to overfit -- although that doesn't necessarily mean it will have a good fit on the train set (RMSE value).

  • Try plotting some of the variables against the value to predict (y). Perhaps you may be able to see a non-linear pattern (ie a logarithmic relationship).

  • Do you know anything about the data? Perhaps a variable that is the multiple, or the division between two variables may be a good indicator.

  • If you are regularizing (or if the software is automatically applying) your regression, try reducing the regularization parameter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM