简体   繁体   English

访问xgboost r软件包中的监视列表历史记录

[英]accessing watchlist history in xgboost r package

I'm using the xgboost R package to perform a multi-class classification task. 我正在使用xgboost R包执行多类分类任务。 This is a piece of code I create to illustrate the problem(input and output are randomly generated so results of course makes no sense, it's something I've done just to play around and learn how to handle the package): 这是我创建的用于说明问题的代码(输入和输出是随机生成的,因此结果当然是没有意义的,这只是我玩弄并学习如何处理程序包的目的):

require(xgboost)
# First of all I set some parameters
featureNumber = 5
num_class = 4
obs = 1000

# I declare  a function that I will use to generate my categorical labels
generateLabels <- function(x,num_class){
  label <- 0
  if(runif(1,min=0,max =1) <0.1){
      label <- 0
  }else{
      label <- which.max(x) -1
      foo <- runif(1,min=0,max =1)
      if(foo > 0.9){label <- label + 1}
      if(foo < 0.1){label <- label - 1}
  }
  return(max(min(label,num_class-1),0))
}

# I generate a random train set and his labels
features <- matrix(runif(featureNumber*obs, 1, 10), ncol = featureNumber)
labels <- apply(features, 1, generateLabels,num_class = num_class) 
dTrain <- xgb.DMatrix(data = features, label = labels)

# I generate a random test set and his labels
testObs = floor(obs*0.25)
featuresTest <- matrix(runif(featureNumber*testObs, 1, 10), ncol = featureNumber)
labelsTest <- apply(featuresTest, 1, generateLabels, num_class = num_class) 
dTest <- xgb.DMatrix(data = featuresTest, label = labelsTest)

# I train the 
xgbm   <- xgb.train(data = dTrain, 
                  nrounds = 10,
                  objective = "multi:softprob", 
                  eval_metric = "mlogloss", 
                  watchlist = list(train=dTrain, eval=dTest),                          
                  num_class = featureNumber)

This works as expected and produces the expected results, here's a few lines: 这可以按预期方式工作并产生预期的结果,以下几行:

[0] train-mlogloss:1.221495 eval-mlogloss:1.292785
[1] train-mlogloss:0.999905 eval-mlogloss:1.121077
[2] train-mlogloss:0.846809 eval-mlogloss:1.014519
[3] train-mlogloss:0.735182 eval-mlogloss:0.942461
[4] train-mlogloss:0.650207 eval-mlogloss:0.891341
[5] train-mlogloss:0.580136 eval-mlogloss:0.851774
[6] train-mlogloss:0.524390 eval-mlogloss:0.827973
[7] train-mlogloss:0.475884 eval-mlogloss:0.815081
[8] train-mlogloss:0.435342 eval-mlogloss:0.799799
[9] train-mlogloss:0.402307 eval-mlogloss:0.789209

What I cannot achieve is to store those values to use them later. 我无法实现的是存储这些值以供以后使用。 Is it possible to do this? 是否有可能做到这一点? It would be very helpful to tune the parameters. 调整参数将非常有帮助。

PS I know I could use the xgb.cv, the cross-validation method included in the package, to obtain similar results; PS我知道我可以使用xgb.cv(包中包含的交叉验证方法)获得类似的结果; but I'd rather use this method to have more control over what happens, also, since those metrics are calculated, it seems to me a waste of computational power not having the possibility to use them apart from reading it on-screen. 但是我宁愿使用这种方法来更好地控制发生的情况,而且,由于这些指标是经过计算的,因此在我看来,除了在屏幕上阅读之外,没有可能使用它们来浪费计算能力。

您可以使用xbgm$bestScorexbgm$bestInd访问最佳回合参数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM