简体   繁体   中英

How to change loss matrix in XGboost (in R)?

I want to classify a binary variable with cost of a false positive higher than a false negative.

In rpart package we use the loss matrix , adding parms = list(loss=matrix(c(0,1,5,0),nrow=2)) for the cost of mis-classifying a negative example as positive is 5 times higher than the cost of mis-classifying a positive example as negative.

How can I do that with XGboost ?

Are you looking for scale_pos_weight parameter?

https://github.com/dmlc/xgboost/blob/master/doc/parameter.md

scale_pos_weight, [default=1] Control the balance of positive and negative weights, useful for unbalanced classes. A typical value to consider: sum(negative cases) / sum(positive cases) See Parameters Tuning for more discussion. Also see Higgs Kaggle competition demo for examples: R, py1, py2, py3

You can use it something like:

clf = xgb.XGBRegressor(objective='binary:logistic', 
                       scale_pos_weight= 5,
                       max_depth=3,
                       n_estimators=100)

in python, sklearn api.

Assuming you are using xgboost package You can use watchlist parameter. It is a list of xgb.DMatrix , each of them tagged with a name. You can use eval.metric parameter, multiple evaluation metrices are also allowed.

watchlist <- list(train=dtrain, test=dtest)

bst <- xgb.train(data=dtrain, max.depth=2, eta=1, nthread = 2,
eval.metric = "error", eval.metric = "logloss", nround=2,
watchlist=watchlist, objective = "binary:logistic")

If the extensive list of metrices in xgboost github pages does not suffice your need then as they say you can generate you own metrices, eg a weighted sum of false positive and false negative where false positive is weighted five times more than the false negative.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM