LIME (with h2o) explanation error

Question

dI'm new to R and ML but have a focused question that I am trying to answer.

I'm using my own data but following Matt Dancho's example here to predict attrition: http://www.business-science.io/business/2017/09/18/hr_employee_attrition.html

I have removed zero variance and scaled variables as per his update.

My issue is running the explain() on explainer step. I get variations of both errors below (in bold) when I run the former original code and the latter variation. Everything else runs up to that point.

explanation <- lime::explain(
as.data.frame(test_h2o[1:10,-1]), 
explainer    = explainer, 
n_labels     = 1, 
n_features   = 4,
kernel_width = 0.5)

gives:

Error during wrapup: arguments imply differing number of rows: 50000, 0

While

explanation <- lime::explain(
as.data.frame(test_h2o[1:500,-1]), 
explainer    = explainer, 
n_labels     = 1, 
n_features   = 5,
kernel_width = 1)

Gives:

ERROR: Unexpected HTTP Status code: 500 Server Error (url = http://localhost:54321/3/PostFile?destination_frame=C%3A%2FUsers%2Fsim.s%2FAppData%2FLocal%2FTemp%2FRtmpykNkl1%2Ffileb203a8d4a58.csv_sid_afd3_26)
Error: lexical error: invalid char in json text.
<html> <head> <meta http-equiv=
                 (right here) ------^

Please let me know if you have any ideas or insights for this problem, or need additional info from me.

Answer 1

Try this and let me know what you get. Note that this assumes your excel file is stored in a folder called "data" in your working directory. Use getwd() and setwd() to get/set the working directory (or use Projects in RStudio IDE).

library(h2o)        # Professional grade ML pkg
library(tidyquant)  # Loads tidyverse and several other pkgs 
library(readxl)     # Super simple excel reader
library(lime)       # Explain complex black-box ML models
library(recipes)    # Preprocessing for machine learning

hr_data_raw_tbl <- read_excel(path = "data/WA_Fn-UseC_-HR-Employee-Attrition.xlsx")

hr_data_organized_tbl <- hr_data_raw_tbl %>%
  mutate_if(is.character, as.factor) %>%
  select(Attrition, everything())

recipe_obj <- hr_data_organized_tbl %>%
  recipe(formula = Attrition ~ .) %>%
  step_rm(EmployeeNumber) %>%
  step_zv(all_predictors()) %>%
  step_center(all_numeric()) %>%
  step_scale(all_numeric()) %>%
  prep(data = hr_data_organized_tbl)

hr_data_bake_tbl <- bake(recipe_obj, newdata = hr_data_organized_tbl) 

h2o.init()

hr_data_bake_h2o <- as.h2o(hr_data_bake_tbl)

hr_data_split <- h2o.splitFrame(hr_data_bake_h2o, ratios = c(0.7, 0.15), seed = 1234)

train_h2o <- h2o.assign(hr_data_split[[1]], "train" ) # 70%
valid_h2o <- h2o.assign(hr_data_split[[2]], "valid" ) # 15%
test_h2o  <- h2o.assign(hr_data_split[[3]], "test" )  # 15%

y <- "Attrition"
x <- setdiff(names(train_h2o), y)

automl_models_h2o <- h2o.automl(
  x = x, 
  y = y,
  training_frame    = train_h2o,
  validation_frame  = valid_h2o,
  leaderboard_frame = test_h2o,
  max_runtime_secs  = 15
)

automl_leader <- automl_models_h2o@leader

explainer <- lime::lime(
  as.data.frame(train_h2o[,-1]), 
  model          = automl_leader, 
  bin_continuous = FALSE
)

explanation <- lime::explain(
  x              = as.data.frame(test_h2o[1:10,-1]), 
  explainer      = explainer, 
  n_labels       = 1, 
  n_features     = 4,
  n_permutations = 500,
  kernel_width   = 1
)

explanation

LIME (with h2o) explanation error

Question

1 answers

solution1
0 2018-01-03 00:03:11

LIME (with h2o) explanation error

Question

1 answers

solution1 0 2018-01-03 00:03:11

solution1
0 2018-01-03 00:03:11