[英]R Partial Dependence Plots for XGBoost
I've run an XGBoost on a sparse matrix and am trying to display some partial dependence plots. 我已经在稀疏矩阵上运行了XGBoost,并试图显示一些部分依赖图。 I've been using PDP package but am open to suggestions. 我一直在使用PDP软件包,但欢迎提出建议。 Below code is a reproducible example of what I'm trying to do. 下面的代码是我正在尝试做的可复制示例。
# load required packages
require(matrix)
require(xgboost)
require(pdp)
# dummy data
categorical <- c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B')
numerical <- c(1, 2, 3, 4, 1, 2, 3, 4)
target <- c(100, 200, 300, 400, 500, 600, 700, 800)
data <- data.frame(categorical, numerical, target)
# create sparse matrix and run xgb
data.sparse = sparse.model.matrix(target~.-1,data)
data.xgb <- xgboost(data=data.sparse, label=data$target, nrounds=100)
# attempt to create partial dependence plots
partial(data.xgb, pred.var="numerical", plot=TRUE, rug=TRUE, train=data, type="regression")
partial(data.xgb, pred.var="categorical", plot=TRUE, rug=TRUE, train=data, type="regression")
partial(data.xgb, pred.var="categoricalA", plot=TRUE, rug=TRUE, train=data.sparse, type="regression")
partial(data.xgb, pred.var="categoricalB", plot=TRUE, rug=TRUE, train=data.sparse, type="regression")
# confirm the model is making sensible predictions despite pdp looking odd
chk <- data[2,]
chk.sparse = sparse.model.matrix(target~.-1,chk)
chk.pred <- predict(data.xgb, chk.sparse)
print(chk.pred) # gives expected values e.g. 199.9992 for second row
Questions 问题
Many thanks 非常感谢
It appears you will have to output the data from partial by setting plot to FALSE and create your own plot. 似乎您需要通过将绘图设置为FALSE并创建自己的绘图来从局部输出数据。 I recommend geom_crossbar for categorical variables. 我建议将geom_crossbar用于分类变量。 I looked into the code for the partial function in pdp on Github and there is a cats argument where you are supposed to name the categorical variables but it is not used any where in the function from what I can see. 我在Github上的pdp中研究了部分函数的代码,有一个cats参数,您应该在其中命名分类变量,但是据我所知,在函数中的任何位置都没有使用它。 For cross validation and grid search use caret. 对于交叉验证和网格搜索,请使用尖号。 This is a great resource to learn how. 这是学习方法的重要资源。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.