简体   繁体   English

H2O随机森林挂牌完成

[英]H2O Random Forest Hangs on Completion

I'm training a Random Forest using h2o and R on a large (~6 million) row dataset and ~50 output levels. 我正在大型(〜600万)行数据集和约50个输出级别上使用h2o和R训练随机森林。 Despite the progress bar hitting 100% the console (and the processor!) is still busy and hangs for over an hour (so far!). 尽管进度条达到了100%,控制台(和处理器!)仍然很忙,并且挂了一个多小时(到目前为止!)。 Definitely not resource limitations, I have 120gb of RAM and a couple of dozen cores. 绝对不是资源限制,我有120gb的RAM和几十个内核。

Hard to give a fully reproducible example given the nature of the issue but there are 35 variables, half of which are factors, I'm running the model training through R with the following options: 考虑到问题的性质,很难给出一个完全可重现的示例,但是有35个变量,其中一半是因素,我正在使用以下选项通过R运行模型训练:

rforest <- h2o.randomForest(y = y.var
                          , x = x.vars
                          , training_frame = trainData.h2o
                          , validation_frame = testData.h2o
                          , ntrees = 100
                          , stopping_rounds = 3
                          , seed = 42
                          , model_id = modCode
                          , mtries = -1)

Has anyone encountered a similar issue/has an explanation/knows a workaround, please? 请问有人遇到类似问题/有解释/知道解决方法吗?

Did you do logarithmic transformation of response variable (ie y ) before running the model? 在运行模型之前,您是否对响应变量(即y )进行了对数转换? If yes, then are you sure you did not have any y = 1 values BEFORE you log-transformed it? 如果是,那么确定对数转换之前没有y = 1值吗? I had a similar issue, and model worked really fast after I removed from data set the rows with y = 1 . 我遇到了类似的问题,并且在从数据集中删除y = 1的行后,模型的工作速度非常快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM