How to save parsnip/agua based H2O object and retrieve it again

Question

I have the following script using tidymodels' agua package:

library(tidymodels)
library(agua)
library(ggplot2)
theme_set(theme_bw())
h2o_start()

data(concrete)
set.seed(4595)
concrete_split <- initial_split(concrete, strata = compressive_strength)
concrete_train <- training(concrete_split)
concrete_test <- testing(concrete_split)

# run for a maximum of 120 seconds
auto_spec <-
  auto_ml() %>%
  set_engine("h2o", max_runtime_secs = 120, seed = 1) %>%
  set_mode("regression")

normalized_rec <-
  recipe(compressive_strength ~ ., data = concrete_train) %>%
  step_normalize(all_predictors())

auto_wflow <-
  workflow() %>%
  add_model(auto_spec) %>%
  add_recipe(normalized_rec)

auto_fit <- fit(auto_wflow, data = concrete_train)
saveRDS(auto_fit, file = "test.h2o.auto_fit.rds") #save the object
h2o_end()

There I tried to save the auto_fit object into a file. But when I tried to retrieve it and use it to predict test data:

h2o_start()
auto_fit <- readRDS("test.h2o.auto_fit.rds")
predict(auto_fit, new_data = concrete_test)

I got an error:

Error in `h2o_get_model()`:
! Model id does not exist on the h2o server.

What's the way to go about it?

The expected result is:

predict(auto_fit, new_data = concrete_test)
#> # A tibble: 260 × 1
#>    .pred
#>    <dbl>
#>  1  40.0
#>  2  43.0
#>  3  38.2
#>  4  55.7
#>  5  41.4
#>  6  28.1
#>  7  53.2
#>  8  34.5
#>  9  51.1
#> 10  37.9
#> # … with 250 more rows

Update

After following Simon Couch advice

auto_fit <- fit(auto_wflow, data = concrete_train)
auto_fit_bundle <- bundle(auto_fit)
saveRDS(auto_fit_bundle, file = "test.h2o.auto_fit.rds") #save the object
h2o_end()

# to reload
h2o_start()
auto_fit_bundle <- readRDS("test.h2o.auto_fit.rds")
auto_fit <- unbundle(auto_fit_bundle)
predict(auto_fit, new_data = concrete_test)

rank_results(auto_fit)

I got this error message:

Error in UseMethod("rank_results") : 
  no applicable method for 'rank_results' applied to an object of class "c('H2ORegressionModel', 'H2OModel', 'Keyed')"

Answer 1

Some model objects in R require native serialization methods to be saved and reloaded from file—h2o objects (and thus the tidymodels objects that wrap them) are an example of one that does.

The tidymodels and vetiver teams at Posit recently collaborated on a package, bundle , that provides a consistent interface to native serialization methods. The docs on h2o are here .

library(bundle)

In short, you will want to bundle() the object you're preparing to save, save it with the usual saveRDS() , and then, in your new session, loadRDS() and unbundle() the loaded-in object. The output of unbundle() is your ready-to-go model object. :)

# to save:
auto_fit <- fit(auto_wflow, data = concrete_train)
auto_fit_bundle <- bundle(auto_fit)
saveRDS(auto_fit_bundle, file = "test.h2o.auto_fit.rds") #save the object
h2o_end()

# to reload
h2o_start()
auto_fit_bundle <- readRDS("test.h2o.auto_fit.rds")
auto_fit <- unbundle(auto_fit_bundle)
predict(auto_fit, new_data = concrete_test)

How to save parsnip/agua based H2O object and retrieve it again

Question

1 answers

solution1
2 2022-12-14 15:09:57

How to save parsnip/agua based H2O object and retrieve it again

Question

1 answers

solution1 2 2022-12-14 15:09:57

solution1
2 2022-12-14 15:09:57