简体   繁体   English

如何更改XGboost中的损耗矩阵(在R中)?

[英]How to change loss matrix in XGboost (in R)?

I want to classify a binary variable with cost of a false positive higher than a false negative. 我想将误报成本高于误报成本的二进制变量分类。

In rpart package we use the loss matrix , adding parms = list(loss=matrix(c(0,1,5,0),nrow=2)) for the cost of mis-classifying a negative example as positive is 5 times higher than the cost of mis-classifying a positive example as negative. rpart包中,我们使用损失矩阵,添加parms = list(loss=matrix(c(0,1,5,0),nrow=2))来将错误的示例错误分类为正值是正值的5倍而不是将一个正面的例子错误地归类为负面的代价。

How can I do that with XGboost ? 如何使用XGboost做到这XGboost

Are you looking for scale_pos_weight parameter? 您在寻找scale_pos_weight参数吗?

https://github.com/dmlc/xgboost/blob/master/doc/parameter.md https://github.com/dmlc/xgboost/blob/master/doc/parameter.md

scale_pos_weight, [default=1] Control the balance of positive and negative weights, useful for unbalanced classes. scale_pos_weight,[默认值= 1]控制正负权重的平衡,对不平衡类很有用。 A typical value to consider: sum(negative cases) / sum(positive cases) See Parameters Tuning for more discussion. 需要考虑的典型值:sum(负数)/ sum(正数)有关更多讨论,请参见参数调整。 Also see Higgs Kaggle competition demo for examples: R, py1, py2, py3 另请参见Higgs Kaggle竞争演示示例:R,py1,py2,py3

You can use it something like: 您可以使用类似:

clf = xgb.XGBRegressor(objective='binary:logistic', 
                       scale_pos_weight= 5,
                       max_depth=3,
                       n_estimators=100)

in python, sklearn api. 在python中,sklearn api。

Assuming you are using xgboost package You can use watchlist parameter. 假设您正在使用xgboost包,则可以使用watchlist参数。 It is a list of xgb.DMatrix , each of them tagged with a name. 它是xgb.DMatrix的列表,每个标签都标记有一个名称。 You can use eval.metric parameter, multiple evaluation metrices are also allowed. 您可以使用eval.metric参数,也可以使用多个评估指标。

watchlist <- list(train=dtrain, test=dtest)

bst <- xgb.train(data=dtrain, max.depth=2, eta=1, nthread = 2,
eval.metric = "error", eval.metric = "logloss", nround=2,
watchlist=watchlist, objective = "binary:logistic")

If the extensive list of metrices in xgboost github pages does not suffice your need then as they say you can generate you own metrices, eg a weighted sum of false positive and false negative where false positive is weighted five times more than the false negative. 如果xgboost github页面中的大量指标无法满足您的需求,那么正如他们所说,您可以生成自己的指标,例如,假阳性和假阴性的加权总和,其中假阳性的权重是假阴性的五倍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM