[英]How to predict the test set's confidence interval using a tuned model from tidymodels in R?
I am fitting a random forest model using tidymodels
in R, and an error occurs when I try to predict the test set using the tuned model: Each element of splits
must be an rsplit
object.我在 R 中使用
tidymodels
拟合随机森林 model,当我尝试使用调整后的 model 预测测试集时出现错误: splits
的每个元素必须是一个rsplit
object。
# Data splitting
data(Sacramento, package = "modeldata")
set.seed(123)
data_split <- initial_split(Sacramento, prop = 0.75, strata = price)
Sac_train <- training(data_split)
Sac_test <- testing(data_split)
# Build the model
rf_mod <- rand_forest(mtry = tune(), min_n = tune(), trees = 1000) %>%
set_engine("ranger", importance = "permutation") %>%
set_mode("regression")
# Create the recipe
Sac_recipe <- recipe(price ~ ., data = Sac_train) %>%
step_rm(zip, latitude, longitude) %>%
step_corr(all_numeric_predictors(), threshold = 0.85) %>%
step_zv(all_numeric_predictors()) %>%
step_normalize(all_numeric_predictors()) %>%
step_dummy(all_nominal_predictors())
# Create the workflow
rf_workflow <- workflow() %>%
add_model(rf_mod) %>%
add_recipe(Sac_recipe)
# Train and Tune the model
set.seed(123)
Sac_folds <- vfold_cv(Sac_train, v = 10, repeats = 2, strata = price)
rf_res <- rf_workflow %>%
tune_grid(grid = 2*2,
resamples = Sac_folds,
control = control_grid(save_pred = TRUE),
metrics = metric_set(rmse))
# Extract the best model
rf_best <- rf_res %>%
select_best(metric = "rmse")
# Last fit
last_rf_workflow <- rf_workflow %>%
finalize_workflow(rf_best)
last_rf_fit <- last_rf_workflow %>%
last_fit(Sac_train)
# Error: Each element of `splits` must be an `rsplit` object.
predict(last_rf_fit, Sac_test, type = "conf_int")
The error generates from these lines,错误从这些行产生,
last_rf_fit <- last_rf_workflow %>%
last_fit(Sac_train)
Now from the documentation of last_fit
,现在从
last_fit
的文档中,
# S3 method for workflow
last_fit(object, split, ..., metrics = NULL, control = control_last_fit())
So an workflow
object is passed to last_fit
as the first argument via %>%
and Sac_train
is passed to split
parameter.因此,
workflow
last_fit
作为第一个参数通过%>%
传递给 last_fit, Sac_train
传递给split
参数。
But from the docs, the split
argument needs to be,但是从文档来看,
split
参数需要是,
An rsplit object created from
rsample::initial_split()
从
rsample::initial_split()
创建的 rsplit object
Instead, try this,相反,试试这个,
last_rf_fit <- last_rf_workflow %>%
last_fit(data_split)
Then to collect the predictions, following the docs ,然后 按照文档收集预测,
collect_predictions(last_rf_fit)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.