简体   繁体   中英

LIME (with h2o) explanation error

dI'm new to R and ML but have a focused question that I am trying to answer.

I'm using my own data but following Matt Dancho's example here to predict attrition: http://www.business-science.io/business/2017/09/18/hr_employee_attrition.html

I have removed zero variance and scaled variables as per his update.

My issue is running the explain() on explainer step. I get variations of both errors below (in bold) when I run the former original code and the latter variation. Everything else runs up to that point.

explanation <- lime::explain(
as.data.frame(test_h2o[1:10,-1]), 
explainer    = explainer, 
n_labels     = 1, 
n_features   = 4,
kernel_width = 0.5)

gives:

Error during wrapup: arguments imply differing number of rows: 50000, 0

While

explanation <- lime::explain(
as.data.frame(test_h2o[1:500,-1]), 
explainer    = explainer, 
n_labels     = 1, 
n_features   = 5,
kernel_width = 1)

Gives:

ERROR: Unexpected HTTP Status code: 500 Server Error (url = http://localhost:54321/3/PostFile?destination_frame=C%3A%2FUsers%2Fsim.s%2FAppData%2FLocal%2FTemp%2FRtmpykNkl1%2Ffileb203a8d4a58.csv_sid_afd3_26)
Error: lexical error: invalid char in json text.
<html> <head> <meta http-equiv=
                 (right here) ------^

Please let me know if you have any ideas or insights for this problem, or need additional info from me.

Try this and let me know what you get. Note that this assumes your excel file is stored in a folder called "data" in your working directory. Use getwd() and setwd() to get/set the working directory (or use Projects in RStudio IDE).

library(h2o)        # Professional grade ML pkg
library(tidyquant)  # Loads tidyverse and several other pkgs 
library(readxl)     # Super simple excel reader
library(lime)       # Explain complex black-box ML models
library(recipes)    # Preprocessing for machine learning

hr_data_raw_tbl <- read_excel(path = "data/WA_Fn-UseC_-HR-Employee-Attrition.xlsx")

hr_data_organized_tbl <- hr_data_raw_tbl %>%
  mutate_if(is.character, as.factor) %>%
  select(Attrition, everything())

recipe_obj <- hr_data_organized_tbl %>%
  recipe(formula = Attrition ~ .) %>%
  step_rm(EmployeeNumber) %>%
  step_zv(all_predictors()) %>%
  step_center(all_numeric()) %>%
  step_scale(all_numeric()) %>%
  prep(data = hr_data_organized_tbl)

hr_data_bake_tbl <- bake(recipe_obj, newdata = hr_data_organized_tbl) 

h2o.init()

hr_data_bake_h2o <- as.h2o(hr_data_bake_tbl)

hr_data_split <- h2o.splitFrame(hr_data_bake_h2o, ratios = c(0.7, 0.15), seed = 1234)

train_h2o <- h2o.assign(hr_data_split[[1]], "train" ) # 70%
valid_h2o <- h2o.assign(hr_data_split[[2]], "valid" ) # 15%
test_h2o  <- h2o.assign(hr_data_split[[3]], "test" )  # 15%

y <- "Attrition"
x <- setdiff(names(train_h2o), y)

automl_models_h2o <- h2o.automl(
  x = x, 
  y = y,
  training_frame    = train_h2o,
  validation_frame  = valid_h2o,
  leaderboard_frame = test_h2o,
  max_runtime_secs  = 15
)

automl_leader <- automl_models_h2o@leader

explainer <- lime::lime(
  as.data.frame(train_h2o[,-1]), 
  model          = automl_leader, 
  bin_continuous = FALSE
)

explanation <- lime::explain(
  x              = as.data.frame(test_h2o[1:10,-1]), 
  explainer      = explainer, 
  n_labels       = 1, 
  n_features     = 4,
  n_permutations = 500,
  kernel_width   = 1
)

explanation

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM