简体   繁体   中英

Balanced random forest in R using H2O

Due to the fact that I'm currently working on a highly unbalanced multi-class classification problem, I'm considering balanced random forests ( https://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf ). Do you have some experience implementing balanced random forests using H2O? If so, could you please elaborate on the following question:

Is it even possible to change the default process of creating bootstrap samples within H2O to come up with balanced sub-samples (for each iteration in the random forest, draw a bootstrap sample from the minority class. Randomly draw the same number of cases, with replacement, from the majority classes) of the original data set for each tree to grow?

H2O's random forest doesn't perform bootstrapping, instead it samples at a rate of 63.2% (which is the expected value of unique rows in any bootstrapped sample).

If you want to get a balanced sample, you can use can use the parameter balance_classes with class_sampling_factors , or weights_column

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM