简体   繁体   English

对R中的XGBoost进行故障排除

[英]Troubleshooting XGBoost in R

I have a dataset with 25000 rows and 761 columns, which includes one binary response column. 我有一个包含25000行和761列的数据集,其中包括一个二进制响应列。 My binary response had values '-1' and '1'. 我的二进制响应的值为'-1'和'1'。 I was trying to run xgboost on it, and keep getting an error which says- 我试图在其上运行xgboost,并不断收到错误消息-

xg_base<-xgboost(data = features,label = output,objective="binary:logistic",eta=1,nthreads=2,nrounds = 10
             , verbose = T, print.every.n = 5)


Error in xgb.iter.update(bst$handle, dtrain, i - 1, obj) : 
label must be in [0,1] for logistic regression

I changed the levels of my response using the following command- 我使用以下命令更改了响应级别:

levels(output)[levels(output)=="-1"] <- "0"

I still keep getting the same error, and am not sure what exactly the issue is. 我仍然不断收到相同的错误,并且不确定到底是什么问题。 One important point is that this is a rare event detection problem, with the proportion of positive cases being 1% of the total observations. 重要的一点是,这是一个罕见的事件检测问题,阳性病例的比例为总观察值的1%。 Could that be the reason I'm getting the error? 这可能是我遇到错误的原因吗?

Just so this may help someone trying to convert a factor variable with levels 0 and 1 into labels for input to XGBoost, you need to be aware that you need to subtract 1 after converting to integer (or numeric): 这样做可能会对尝试将级别0和1的因子变量转换为标签以输入XGBoost的人有所帮助,您需要了解在转换为整数(或数字)后需要减去1:

> f <- as.factor(c(0, 1, 1, 0))

# XGBoost will not accept this for label
> as.integer(f)
[1] 1 2 2 1

# Correct label
> as.integer(f) - 1
[1] 0 1 1 0

After you change the -1's to 0's, change output from factor to numeric: 将-1更改为0后,将output从factor更改为数字:

output <- as.numeric(levels(output))[output]

I don't think the fact that this is a rare event detection problem is related to the error. 我认为这是一个罕见的事件检测问题,与错误无关。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM