[英]How can I offset exposures in a gbm model in R?
I am trying to fit a gradient boosting machine (GBM) to insurance claims. 我正在尝试将梯度增强机(GBM)用于保险索赔。 The observations have unequal exposure so I am trying to use an offset equal to the log of exposures. 观察结果具有不相等的曝光量,因此我尝试使用等于曝光对数的偏移量。 I tried two different ways: 我尝试了两种不同的方式:
Put an offset term in the formula. 在公式中添加偏移项。 This resulted in nan
for the train and validation deviance for every iteration. 这导致nan
火车和验证越轨每次迭代。
Use the offset
parameter in the gbm
function. 在gbm
函数中使用offset
参数。 This parameter is listed under gbm.more
. 此参数列在gbm.more
下。 This results in an error message that there is an unused parameter. 这会导致出现未使用参数的错误消息。
I can't share my company's data but I reproduced the problem using the Insurance data table in the MASS package. 我不能分享我公司的数据,但我使用MASS包中的保险数据表重现了这个问题。 See the code and output below. 请参阅下面的代码和输出。
library(MASS)
library(gbm)
data(Insurance)
# Try using offset in the formula.
fm1 = formula(Claims ~ District + Group + Age + offset(log(Holders)))
fitgbm1 = gbm(fm1, distribution = "poisson",
data = Insurance,
n.trees = 10,
shrinkage = 0.1,
verbose = TRUE)
# Try using offset in the gbm statement.
fm2 = formula(Claims ~ District + Group + Age)
offset2 = log(Insurance$Holders)
fitgbm2 = gbm(fm2, distribution = "poisson",
data = Insurance,
n.trees = 10,
shrinkage = 0.1,
offset = offset2,
verbose = TRUE)
This then outputs: 然后输出:
> source('D:/Rprojects/auto_tutorial/rcode/example_gbm.R')
Iter TrainDeviance ValidDeviance StepSize Improve
1 -347.8959 nan 0.1000 0.0904
2 -348.2181 nan 0.1000 0.0814
3 -348.3845 nan 0.1000 0.0616
4 -348.5424 nan 0.1000 0.0333
5 -348.6732 nan 0.1000 0.0850
6 -348.7744 nan 0.1000 0.0610
7 -348.8795 nan 0.1000 0.0633
8 -348.9132 nan 0.1000 -0.0109
9 -348.9200 nan 0.1000 -0.0212
10 -349.0271 nan 0.1000 0.0267
Error in gbm(fm2, distribution = "poisson", data = Insurance, n.trees = 10, :
unused argument (offset = offset2)
My question is what am I doing wrong? 我的问题是我做错了什么? Also, is there another way? 还有,还有另外一种方法吗? I noticed a weights parameter in the gbm
function. 我注意到gbm
函数中有一个权重参数。 Should I use that? 我应该用吗?
Your first suggestion works if you specify a training fraction less than 1. The default is 1, which means there is no validation set. 如果指定小于1的训练分数,则第一个建议有效。默认值为1,表示没有验证集。
library(MASS)
library(gbm)
data(Insurance)
# Try using offset in the formula.
fm1 = formula(Claims ~ District + Group + Age + offset(log(Holders)))
fitgbm1 = gbm(fm1, distribution = "poisson",
data = Insurance,
n.trees = 10,
shrinkage = 0.1,
verbose = TRUE,
train.fraction = .75)
results in 结果是
Iter TrainDeviance ValidDeviance StepSize Improve
1 -428.8293 -105.1735 0.1000 0.0888
2 -429.0869 -105.3063 0.1000 0.0708
3 -429.1805 -105.3941 0.1000 0.0486
4 -429.3414 -105.4816 0.1000 0.0933
5 -429.4934 -105.5432 0.1000 0.0566
6 -429.6714 -105.5188 0.1000 0.1212
7 -429.8470 -105.5200 0.1000 0.0833
8 -429.9655 -105.6073 0.1000 0.0482
9 -430.1367 -105.6003 0.1000 0.0473
10 -430.2462 -105.6100 0.1000 0.0487
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.