简体   繁体   English

如何在R中的gbm模型中抵消曝光?

[英]How can I offset exposures in a gbm model in R?

I am trying to fit a gradient boosting machine (GBM) to insurance claims. 我正在尝试将梯度增强机(GBM)用于保险索赔。 The observations have unequal exposure so I am trying to use an offset equal to the log of exposures. 观察结果具有不相等的曝光量,因此我尝试使用等于曝光对数的偏移量。 I tried two different ways: 我尝试了两种不同的方式:

  1. Put an offset term in the formula. 在公式中添加偏移项。 This resulted in nan for the train and validation deviance for every iteration. 这导致nan火车和验证越轨每次迭代。

  2. Use the offset parameter in the gbm function. gbm函数中使用offset参数。 This parameter is listed under gbm.more . 此参数列在gbm.more下。 This results in an error message that there is an unused parameter. 这会导致出现未使用参数的错误消息。

I can't share my company's data but I reproduced the problem using the Insurance data table in the MASS package. 我不能分享我公司的数据,但我使用MASS包中的保险数据表重现了这个问题。 See the code and output below. 请参阅下面的代码和输出。

library(MASS)
library(gbm)

data(Insurance)

# Try using offset in the formula.
fm1 = formula(Claims ~ District + Group + Age + offset(log(Holders)))

fitgbm1 = gbm(fm1, distribution = "poisson",
              data = Insurance,
              n.trees = 10,
              shrinkage = 0.1,
              verbose = TRUE)

# Try using offset in the gbm statement.
fm2 = formula(Claims ~ District + Group + Age)
offset2 = log(Insurance$Holders)

fitgbm2 = gbm(fm2, distribution = "poisson",
              data = Insurance,
              n.trees = 10,
              shrinkage = 0.1,
              offset = offset2,
              verbose = TRUE)

This then outputs: 然后输出:

> source('D:/Rprojects/auto_tutorial/rcode/example_gbm.R')
Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1     -347.8959             nan     0.1000    0.0904
     2     -348.2181             nan     0.1000    0.0814
     3     -348.3845             nan     0.1000    0.0616
     4     -348.5424             nan     0.1000    0.0333
     5     -348.6732             nan     0.1000    0.0850
     6     -348.7744             nan     0.1000    0.0610
     7     -348.8795             nan     0.1000    0.0633
     8     -348.9132             nan     0.1000   -0.0109
     9     -348.9200             nan     0.1000   -0.0212
    10     -349.0271             nan     0.1000    0.0267

Error in gbm(fm2, distribution = "poisson", data = Insurance, n.trees = 10,  : 
  unused argument (offset = offset2)

My question is what am I doing wrong? 我的问题是我做错了什么? Also, is there another way? 还有,还有另外一种方法吗? I noticed a weights parameter in the gbm function. 我注意到gbm函数中有一个权重参数。 Should I use that? 我应该用吗?

Your first suggestion works if you specify a training fraction less than 1. The default is 1, which means there is no validation set. 如果指定小于1的训练分数,则第一个建议有效。默认值为1,表示没有验证集。

library(MASS)
library(gbm)

data(Insurance)

# Try using offset in the formula.
fm1 = formula(Claims ~ District + Group + Age + offset(log(Holders)))

fitgbm1 = gbm(fm1, distribution = "poisson",
              data = Insurance,
              n.trees = 10,
              shrinkage = 0.1,
              verbose = TRUE,
              train.fraction = .75)

results in 结果是

Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1     -428.8293       -105.1735     0.1000    0.0888
     2     -429.0869       -105.3063     0.1000    0.0708
     3     -429.1805       -105.3941     0.1000    0.0486
     4     -429.3414       -105.4816     0.1000    0.0933
     5     -429.4934       -105.5432     0.1000    0.0566
     6     -429.6714       -105.5188     0.1000    0.1212
     7     -429.8470       -105.5200     0.1000    0.0833
     8     -429.9655       -105.6073     0.1000    0.0482
     9     -430.1367       -105.6003     0.1000    0.0473
    10     -430.2462       -105.6100     0.1000    0.0487

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM