how to make sklearn.ensemble.RandomForestRegressor not take care of impurity decrease heuristic

Question

I am using RandomForestRegressor of sklearn to implement Random Forest Imputation. Sklearn allows us to set parameter min_impurity_decrease to specify the heuristic of split stopping criteria. For example, if min_impurity_decrease = 0.0 , and if a node split results in a worse impurity, then the node will be made a leaf node.

The problem is that, I prefer Random Forest to be fully grown without early stopping or pruned. But min_impurity_decrease has to be set as a non-negative float. Is there any solution to this situation?

Intuitively, I have tried to set min_impurity_decrease = float("-inf") , which results in error message.

Answer 1

You apparently have to modify sklearn code. Take a look at this answer on how to install sklearn in editable mode. Be sure to create new virtual environment so as to not mess up original sklearn files.

Good news is you don't have to change any Cython code. Go to file sklearn/tree/tree.py . A check for the value of min_impurity_decrease only seems to be present in BaseDecisionTree class. According to Github, in 306 line there is a code snippet:

if self.min_impurity_decrease < 0.:
        raise ValueError("min_impurity_decrease must be greater than "
                         "or equal to 0")

Simply delete this and reload the library. I couldn't test this solution, so let me know if you run into some problem.

how to make sklearn.ensemble.RandomForestRegressor not take care of impurity decrease heuristic

Question

1 answers

solution1
0 2019-08-16 11:50:27

how to make sklearn.ensemble.RandomForestRegressor not take care of impurity decrease heuristic

Question

1 answers

solution1 0 2019-08-16 11:50:27

solution1
0 2019-08-16 11:50:27