简体   繁体   中英

Random Forest vs Logistic Regression

I am working on a dataset. It is a classification problem. One column of the dataset has around 11000 missing values out of total 300k observations (It is a categorical variable so missing value imputation like numerical ones is not possible).

Is it advisable to go ahead with Random Forest rather than Logistic Regression as Random Forest is not affected by missing values?

Also do i need to take care of multi-collinearity among independent variables while using RF or there is no need of that?

  1. Although the RFM can handle noise data and missing values, it seems difficult to say that it is better than logistic. Because logistic can also be improved through other pre-processing (PCA or missing data imputation) or ensemble method.

  2. I think RF does not have to take into account the multi-collinearity . This is because the variables are randomly selected to create different trees and produce results. In this process, the most important attributes are chosen and interpreted as solving the problem of multi-collinearity with similar trends.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM