H2O 本地服务器意外死机

Question

I am having a problem reproducing the AutoML tutorial in H2O documentation .我在复制H2O 文档中的 AutoML 教程时遇到问题。 After initatiing my h2o local server ( h2o.init() ) I get the following output, which sounds correct:启动我的 h2o 本地服务器（ h2o.init() ）后，我得到以下 output，这听起来是正确的：

Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: java version "1.8.0_181"; Java(TM) SE Runtime Environment (build 1.8.0_181-b13); Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
  Starting server from /home/cdsw/.local/lib/python3.8/site-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmp3nh32di4
  JVM stdout: /tmp/tmp3nh32di4/h2o_cdsw_started_from_python.out
  JVM stderr: /tmp/tmp3nh32di4/h2o_cdsw_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
H2O_cluster_uptime: 01 secs
H2O_cluster_timezone:   Etc/UTC
H2O_data_parsing_timezone:  UTC
H2O_cluster_version:    3.32.1.3
H2O_cluster_version_age:    14 days, 20 hours and 29 minutes
H2O_cluster_name:   H2O_from_python_cdsw_cpcrap
H2O_cluster_total_nodes:    1
H2O_cluster_free_memory:    13.98 Gb
H2O_cluster_total_cores:    32
H2O_cluster_allowed_cores:  32
H2O_cluster_status: accepting new members, healthy
H2O_connection_url: http://127.0.0.1:54321
H2O_connection_proxy:   {"http": null, "https": null}
H2O_internal_security:  False
H2O_API_Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4
Python_version: 3.8.5 final

Next, I import the datasets as specified by the tutorial:接下来，我按照教程的规定导入数据集：

# Identify predictors and response
x = train.columns
y = "response"
x.remove(y)

# For binary classification, response should be a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()

Finally, I train my AutoML model:最后，我训练我的 AutoML model：

# Run AutoML for 20 base models (limited to 1 hour max runtime by default)
aml = H2OAutoML(max_models=20, seed=1)
aml.train(x=x, y=y, training_frame=train)

That is when it crashes with following message:那是当它崩溃并显示以下消息时：

AutoML progress: |██Failed polling AutoML progress log: Local server has died unexpectedly. RIP.
Job request failed Local server has died unexpectedly. RIP., will retry after 3s.
Job request failed Local server has died unexpectedly. RIP., will retry after 3s.

Have tried with different datasets, including some sample in case it was a memory issue but with no avail.尝试过使用不同的数据集，包括一些样本，以防出现 memory 问题，但无济于事。 The error prevails.错误占上风。

Anyone knows what should I do to fix this?任何人都知道我应该怎么做才能解决这个问题？

Much appreciated!非常感激！

Regards.问候。

Answer 1

I think I was able to solve it.我想我能够解决它。 After some monitoring with the htop command I think the issue was actually a memory one.在使用 htop 命令进行一些监视后，我认为问题实际上是 memory 问题。 I restarted h2o limiting the memory to 1GB and 2 threads (maybe this is not strictly necessary) and I was able to run everything ok, as it seems.我重新启动了 h2o，将 memory 限制为 1GB 和 2 个线程（也许这不是绝对必要的），并且看起来一切正常。

h2o.init(max_mem_size="1G", nthreads=2)

Hope it helps to anyone who stumbles with the same problem.希望对遇到同样问题的人有所帮助。

H2O 本地服务器意外死机

问题描述

1 个解决方案

解决方案1
2 2021-06-04 00:45:14

H2O 本地服务器意外死机

问题描述

1 个解决方案

解决方案1 2 2021-06-04 00:45:14

解决方案1
2 2021-06-04 00:45:14