Random Forest - make null values always have their own branch in a decision tree

Question

Hi I am using random forest to build a model and I am trying to deal with null values. Would anyone happen to know how you could force the random forest model to treat null values as its own separate band? (as in null values never get banded up with other value ranges. Therefore in a decision tree, the null values of a measure always have their own branch).

I don't want to use mean instead of nulls as I don't want the model to band up null values with other values close to the mean and I don't want to remove nulls either.

I want it so that the decision tree always treats null values of a measure as its own branch.

Thanks:)

Answer 1

You could try these.

Replace null values with a value that drastically varies from any other value in the column.

Example

Let 'feature' be the name of a column with only positive values, then a negative value should suffice for null.

dataframe.loc[dataframe['feature'].isna(), 'feature'] = -100

You could add a new null-tracking column to keep track of null values of another column. (Use this if all features are considered for modeling the random forest)

Example

Let 'feature' be the name of a column with null values

dataframe['feature_isnull'] = 0 #null-tracking column
dataframe.loc[dataframe['feature'].isna(),'feature_isnull'] = 1

Random Forest - make null values always have their own branch in a decision tree

Question

1 answers

solution1
0 2019-11-21 10:26:00

Random Forest - make null values always have their own branch in a decision tree

Question

1 answers

solution1 0 2019-11-21 10:26:00

solution1
0 2019-11-21 10:26:00