[英]H2O Random Forest Hangs on Completion
I'm training a Random Forest using h2o and R on a large (~6 million) row dataset and ~50 output levels. 我正在大型(〜600万)行数据集和约50个输出级别上使用h2o和R训练随机森林。 Despite the progress bar hitting 100% the console (and the processor!) is still busy and hangs for over an hour (so far!). 尽管进度条达到了100%,控制台(和处理器!)仍然很忙,并且挂了一个多小时(到目前为止!)。 Definitely not resource limitations, I have 120gb of RAM and a couple of dozen cores. 绝对不是资源限制,我有120gb的RAM和几十个内核。
Hard to give a fully reproducible example given the nature of the issue but there are 35 variables, half of which are factors, I'm running the model training through R with the following options: 考虑到问题的性质,很难给出一个完全可重现的示例,但是有35个变量,其中一半是因素,我正在使用以下选项通过R运行模型训练:
rforest <- h2o.randomForest(y = y.var
, x = x.vars
, training_frame = trainData.h2o
, validation_frame = testData.h2o
, ntrees = 100
, stopping_rounds = 3
, seed = 42
, model_id = modCode
, mtries = -1)
Has anyone encountered a similar issue/has an explanation/knows a workaround, please? 请问有人遇到类似问题/有解释/知道解决方法吗?
Did you do logarithmic transformation of response variable (ie y
) before running the model? 在运行模型之前,您是否对响应变量(即y
)进行了对数转换? If yes, then are you sure you did not have any y = 1
values BEFORE you log-transformed it? 如果是,那么确定对数转换之前没有y = 1
值吗? I had a similar issue, and model worked really fast after I removed from data set the rows with y = 1
. 我遇到了类似的问题,并且在从数据集中删除y = 1
的行后,模型的工作速度非常快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.