简体   繁体   中英

How to save parsnip/agua based H2O object and retrieve it again

I have the following script using tidymodels' agua package:

library(tidymodels)
library(agua)
library(ggplot2)
theme_set(theme_bw())
h2o_start()

data(concrete)
set.seed(4595)
concrete_split <- initial_split(concrete, strata = compressive_strength)
concrete_train <- training(concrete_split)
concrete_test <- testing(concrete_split)

# run for a maximum of 120 seconds
auto_spec <-
  auto_ml() %>%
  set_engine("h2o", max_runtime_secs = 120, seed = 1) %>%
  set_mode("regression")

normalized_rec <-
  recipe(compressive_strength ~ ., data = concrete_train) %>%
  step_normalize(all_predictors())

auto_wflow <-
  workflow() %>%
  add_model(auto_spec) %>%
  add_recipe(normalized_rec)

auto_fit <- fit(auto_wflow, data = concrete_train)
saveRDS(auto_fit, file = "test.h2o.auto_fit.rds") #save the object
h2o_end()

There I tried to save the auto_fit object into a file. But when I tried to retrieve it and use it to predict test data:

h2o_start()
auto_fit <- readRDS("test.h2o.auto_fit.rds")
predict(auto_fit, new_data = concrete_test)

I got an error:

Error in `h2o_get_model()`:
! Model id does not exist on the h2o server.

What's the way to go about it?

The expected result is:

predict(auto_fit, new_data = concrete_test)
#> # A tibble: 260 × 1
#>    .pred
#>    <dbl>
#>  1  40.0
#>  2  43.0
#>  3  38.2
#>  4  55.7
#>  5  41.4
#>  6  28.1
#>  7  53.2
#>  8  34.5
#>  9  51.1
#> 10  37.9
#> # … with 250 more rows

Update

After following Simon Couch advice

auto_fit <- fit(auto_wflow, data = concrete_train)
auto_fit_bundle <- bundle(auto_fit)
saveRDS(auto_fit_bundle, file = "test.h2o.auto_fit.rds") #save the object
h2o_end()

# to reload
h2o_start()
auto_fit_bundle <- readRDS("test.h2o.auto_fit.rds")
auto_fit <- unbundle(auto_fit_bundle)
predict(auto_fit, new_data = concrete_test)

rank_results(auto_fit)

I got this error message:

Error in UseMethod("rank_results") : 
  no applicable method for 'rank_results' applied to an object of class "c('H2ORegressionModel', 'H2OModel', 'Keyed')"

Some model objects in R require native serialization methods to be saved and reloaded from file—h2o objects (and thus the tidymodels objects that wrap them) are an example of one that does.

The tidymodels and vetiver teams at Posit recently collaborated on a package, bundle , that provides a consistent interface to native serialization methods. The docs on h2o are here .

library(bundle)

In short, you will want to bundle() the object you're preparing to save, save it with the usual saveRDS() , and then, in your new session, loadRDS() and unbundle() the loaded-in object. The output of unbundle() is your ready-to-go model object. :)

# to save:
auto_fit <- fit(auto_wflow, data = concrete_train)
auto_fit_bundle <- bundle(auto_fit)
saveRDS(auto_fit_bundle, file = "test.h2o.auto_fit.rds") #save the object
h2o_end()
# to reload
h2o_start()
auto_fit_bundle <- readRDS("test.h2o.auto_fit.rds")
auto_fit <- unbundle(auto_fit_bundle)
predict(auto_fit, new_data = concrete_test)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM