如何保存基于 parsnip/agua 的 H2O 对象并再次检索它

Question

I have the following script using tidymodels' agua package:我有以下使用 tidymodels 的 agua 包的脚本：

library(tidymodels)
library(agua)
library(ggplot2)
theme_set(theme_bw())
h2o_start()

data(concrete)
set.seed(4595)
concrete_split <- initial_split(concrete, strata = compressive_strength)
concrete_train <- training(concrete_split)
concrete_test <- testing(concrete_split)

# run for a maximum of 120 seconds
auto_spec <-
  auto_ml() %>%
  set_engine("h2o", max_runtime_secs = 120, seed = 1) %>%
  set_mode("regression")

normalized_rec <-
  recipe(compressive_strength ~ ., data = concrete_train) %>%
  step_normalize(all_predictors())

auto_wflow <-
  workflow() %>%
  add_model(auto_spec) %>%
  add_recipe(normalized_rec)

auto_fit <- fit(auto_wflow, data = concrete_train)
saveRDS(auto_fit, file = "test.h2o.auto_fit.rds") #save the object
h2o_end()

There I tried to save the auto_fit object into a file.在那里我试图将auto_fit对象保存到一个文件中。 But when I tried to retrieve it and use it to predict test data:但是当我试图检索它并用它来预测测试数据时：

h2o_start()
auto_fit <- readRDS("test.h2o.auto_fit.rds")
predict(auto_fit, new_data = concrete_test)

I got an error:我收到一个错误：

Error in `h2o_get_model()`:
! Model id does not exist on the h2o server.

What's the way to go about it?有什么办法呢？

The expected result is:预期结果是：

predict(auto_fit, new_data = concrete_test)
#> # A tibble: 260 × 1
#>    .pred
#>    <dbl>
#>  1  40.0
#>  2  43.0
#>  3  38.2
#>  4  55.7
#>  5  41.4
#>  6  28.1
#>  7  53.2
#>  8  34.5
#>  9  51.1
#> 10  37.9
#> # … with 250 more rows

Update更新

After following Simon Couch advice听从 Simon Couch 的建议后

auto_fit <- fit(auto_wflow, data = concrete_train)
auto_fit_bundle <- bundle(auto_fit)
saveRDS(auto_fit_bundle, file = "test.h2o.auto_fit.rds") #save the object
h2o_end()

# to reload
h2o_start()
auto_fit_bundle <- readRDS("test.h2o.auto_fit.rds")
auto_fit <- unbundle(auto_fit_bundle)
predict(auto_fit, new_data = concrete_test)

rank_results(auto_fit)

I got this error message:我收到此错误消息：

Error in UseMethod("rank_results") : 
  no applicable method for 'rank_results' applied to an object of class "c('H2ORegressionModel', 'H2OModel', 'Keyed')"

Answer 1

Some model objects in R require native serialization methods to be saved and reloaded from file—h2o objects (and thus the tidymodels objects that wrap them) are an example of one that does. R 中的一些模型对象需要使用本机序列化方法来保存和从文件重新加载——h2o 对象（以及包装它们的 tidymodels 对象）就是一个例子。

The tidymodels and vetiver teams at Posit recently collaborated on a package, bundle , that provides a consistent interface to native serialization methods. Posit 的 tidymodels 和香根草团队最近合作开发了一个包bundle ，它为本机序列化方法提供了一致的接口。 The docs on h2o are here . h2o 上的文档在这里。

library(bundle)

In short, you will want to bundle() the object you're preparing to save, save it with the usual saveRDS() , and then, in your new session, loadRDS() and unbundle() the loaded-in object.简而言之，您需要bundle()您准备保存的对象，使用通常的saveRDS()保存它，然后在您的新会话中， loadRDS()和 unbundle( unbundle()加载的对象。 The output of unbundle() is your ready-to-go model object. unbundle()的输出是您准备好的模型对象。 :) :)

# to save:
auto_fit <- fit(auto_wflow, data = concrete_train)
auto_fit_bundle <- bundle(auto_fit)
saveRDS(auto_fit_bundle, file = "test.h2o.auto_fit.rds") #save the object
h2o_end()

# to reload
h2o_start()
auto_fit_bundle <- readRDS("test.h2o.auto_fit.rds")
auto_fit <- unbundle(auto_fit_bundle)
predict(auto_fit, new_data = concrete_test)

如何保存基于 parsnip/agua 的 H2O 对象并再次检索它

问题描述

1 个解决方案

解决方案1
2 2022-12-14 15:09:57

如何保存基于 parsnip/agua 的 H2O 对象并再次检索它

问题描述

1 个解决方案

解决方案1 2 2022-12-14 15:09:57

解决方案1
2 2022-12-14 15:09:57