简体   繁体   English

R Tidymodels/嵌入配方“step_woe”不起作用

[英]R Tidymodels/Embed recipe "step_woe" not working

I'm trying to add a "step_woe" step to a recipe, where previously i added a "step_discretize_xgb" but i keep getting an error message because of the variables types i need to transform with the step_woe.我正在尝试向配方添加“step_woe”步骤,之前我在其中添加了“step_discretize_xgb”,但由于我需要使用 step_woe 转换的变量类型,我不断收到错误消息。

Here's a short example of my code, with only one variable.这是我的代码的一个简短示例,只有一个变量。


library(embed)
library(tidymodels)
library(tidyverse)
library(xgboost)

TG <- sample(c(0,1), 1000, replace = TRUE)

V1 <- rnorm(1000)

train <- tibble(VARIABLE_1 = V1,
                TARGET = TG)

rec <- recipes::recipe(TARGET ~ ., 
                        data = train) %>% 
  step_discretize_xgb(all_numeric_predictors(), 
                      outcome = vars(TARGET)) %>% 
  step_woe(all_of("VARIABLE_1"),
           outcome = vars(TARGET)) %>% 
  prep(training = train)

PS - I've checked that this variable is a factor and it is binned. PS - 我已经检查过这个变量是一个因素并且它被装箱了。 I tried without the "all_of" and quotes, ie, just VARIABLE_1.我尝试不使用“all_of”和引号,即仅使用 VARIABLE_1。

The message is:消息是:

Error in check_type() : : All columns selected for the step should be factor or character Backtrace: check_type()中的错误::为该步骤选择的所有列都应该是因子或字符回溯:

  1. ... %>% prep(training = train) ... %>% 准备(训练 = 训练)
  2. recipes:::prep.recipe(., training = train)食谱:::prep.recipe(., training = train)
  3. embed:::prep.step_woe(x$steps[[i]], training = training, info = x$term_info)嵌入:::prep.step_woe(x$steps[[i]], training = training, info = x$term_info)
  4. recipes::check_type(training[, outcome_name], quant = FALSE)食谱::检查类型(培训[,结果名称],量化=假)

Error in check_type(training[, outcome_name], quant = FALSE): check_type(training[, outcome_name], quant = FALSE) 错误:

This is an unfortunate error message from {embed}.这是来自 {embed} 的不幸错误消息。 You are getting this error because outcome of step_woe() needs to be a categorical variable.您收到此错误是因为outcome step_woe()的结果需要是分类变量。 Since TG appears to be a categorical variable, you can code it as such and it will work.由于TG似乎是一个分类变量,您可以对其进行编码并且它会起作用。

I have opened an issue to make this error clearer: https://github.com/tidymodels/embed/issues/147我已经打开了一个问题来使这个错误更清楚: https://github.com/tidymodels/embed/issues/147

library(embed)
library(tidymodels)
library(tidyverse)
library(xgboost)


TG <- sample(c("0", "1"), 1000, replace = TRUE)

V1 <- rnorm(1000)

train <- tibble(VARIABLE_1 = V1,
                TARGET = TG)

rec <- recipes::recipe(TARGET ~ ., 
                       data = train) %>% 
  step_discretize_xgb(all_numeric_predictors(), 
                      outcome = vars(TARGET)) %>% 
  step_woe(all_of("VARIABLE_1"),
           outcome = vars(TARGET)) %>% 
  prep(training = train)

rec
#> Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor          1
#> 
#> Training data contained 1000 data points and no missing data.
#> 
#> Operations:
#> 
#> Discretizing variables using xgboost VARIABLE_1 [trained]
#> WoE version against outcome TARGET for VARIABLE_1 [trained]

Created on 2022-11-21 with reprex v2.0.2创建于 2022-11-21,使用reprex v2.0.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM