简体   繁体   中英

Random Forest Feature Importance Robustness with Python

I am using Random Forest from Sklearn for feature importance. However, the importance of features may change by changing the random_state parameter in RF. I am wondering if there is any way to get robust feature importance with RF?

it is because of the principal of Random Forest algorithm. RF finds the optimal by heuristic greedy way. And working on such heuristic way, it mitigates multiple trees with randomly sampled features and samples. And here random_state gives random numbers for sampling. If you see below documents, it says

If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

[ https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html][1]

So if you set random_state with fixed value, you may have fixed value for feature importance. It does not guarantee robustness because RF is not the algorithms guarantee robustness, but gives answer based on its heuristic finding.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM