H2O随机森林挂牌完成

Question

I'm training a Random Forest using h2o and R on a large (~6 million) row dataset and ~50 output levels. 我正在大型（〜600万）行数据集和约50个输出级别上使用h2o和R训练随机森林。 Despite the progress bar hitting 100% the console (and the processor!) is still busy and hangs for over an hour (so far!). 尽管进度条达到了100％，控制台（和处理器！）仍然很忙，并且挂了一个多小时（到目前为止！）。 Definitely not resource limitations, I have 120gb of RAM and a couple of dozen cores. 绝对不是资源限制，我有120gb的RAM和几十个内核。

Hard to give a fully reproducible example given the nature of the issue but there are 35 variables, half of which are factors, I'm running the model training through R with the following options: 考虑到问题的性质，很难给出一个完全可重现的示例，但是有35个变量，其中一半是因素，我正在使用以下选项通过R运行模型训练：

rforest <- h2o.randomForest(y = y.var
                          , x = x.vars
                          , training_frame = trainData.h2o
                          , validation_frame = testData.h2o
                          , ntrees = 100
                          , stopping_rounds = 3
                          , seed = 42
                          , model_id = modCode
                          , mtries = -1)

Has anyone encountered a similar issue/has an explanation/knows a workaround, please? 请问有人遇到类似问题/有解释/知道解决方法吗？

Answer 1

Did you do logarithmic transformation of response variable (ie y ) before running the model? 在运行模型之前，您是否对响应变量（即y ）进行了对数转换？ If yes, then are you sure you did not have any y = 1 values BEFORE you log-transformed it? 如果是，那么确定对数转换之前没有y = 1值吗？ I had a similar issue, and model worked really fast after I removed from data set the rows with y = 1 . 我遇到了类似的问题，并且在从数据集中删除y = 1的行后，模型的工作速度非常快。

H2O随机森林挂牌完成

问题描述

1 个解决方案

解决方案1
0 2019-02-15 12:00:21

H2O随机森林挂牌完成

问题描述

1 个解决方案

解决方案1 0 2019-02-15 12:00:21

解决方案1
0 2019-02-15 12:00:21