简体   繁体   English

使用H2O在R中平衡随机森林

[英]Balanced random forest in R using H2O

Due to the fact that I'm currently working on a highly unbalanced multi-class classification problem, I'm considering balanced random forests ( https://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf ). 由于我目前正在研究高度不平衡的多类分类问题,因此我正在考虑平衡的随机森林( https://statistics.berkeley.edu/sites/default/files/tech-reports/666。 pdf )。 Do you have some experience implementing balanced random forests using H2O? 您是否有使用H2O实施平衡随机森林的经验? If so, could you please elaborate on the following question: 如果是这样,请您详细说明以下问题:

Is it even possible to change the default process of creating bootstrap samples within H2O to come up with balanced sub-samples (for each iteration in the random forest, draw a bootstrap sample from the minority class. Randomly draw the same number of cases, with replacement, from the majority classes) of the original data set for each tree to grow? 是否甚至可以更改在H2O中创建引导程序样本的默认过程以提供平衡的子样本(对于随机森林中的每次迭代,请从少数类中抽取一个引导程序样本。随机抽取相同数量的案例,替换,从多数类)为每棵树生长的原始数据集?

H2O's random forest doesn't perform bootstrapping, instead it samples at a rate of 63.2% (which is the expected value of unique rows in any bootstrapped sample). H2O的随机林不执行引导程序,而是以63.2%的速率采样(这是任何引导程序样本中唯一行的期望值)。

If you want to get a balanced sample, you can use can use the parameter balance_classes with class_sampling_factors , or weights_column 如果要获得平衡的样本,可以使用可以将参数balance_classesclass_sampling_factorsweights_column一起使用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM