简体   繁体   English

R gbm逻辑回归

[英]R gbm logistic regression

I was hoping to use the GBM package to do logistic regression, but it is giving answers slightly outside of the 0-1 range. 我希望使用GBM包来进行逻辑回归,但它的回答略微超出0-1范围。 I've tried the suggested distribution parameters for 0-1 predictions ( bernoulli , and adaboost ) but that actually makes things worse than using gaussian . 我已经尝试了0-1预测( bernoulliadaboost )的建议分布参数,但这实际上比使用gaussian更糟糕。

GBM_NTREES = 150
GBM_SHRINKAGE = 0.1
GBM_DEPTH = 4
GBM_MINOBS = 50
> GBM_model <- gbm.fit(
+ x = trainDescr 
+ ,y = trainClass 
+ ,distribution = "gaussian"
+ ,n.trees = GBM_NTREES
+ ,shrinkage = GBM_SHRINKAGE
+ ,interaction.depth = GBM_DEPTH
+ ,n.minobsinnode = GBM_MINOBS
+ ,verbose = TRUE)
Iter   TrainDeviance   ValidDeviance   StepSize   Improve
     1        0.0603             nan     0.1000    0.0019
     2        0.0588             nan     0.1000    0.0016
     3        0.0575             nan     0.1000    0.0013
     4        0.0563             nan     0.1000    0.0011
     5        0.0553             nan     0.1000    0.0010
     6        0.0546             nan     0.1000    0.0008
     7        0.0539             nan     0.1000    0.0007
     8        0.0533             nan     0.1000    0.0006
     9        0.0528             nan     0.1000    0.0005
    10        0.0524             nan     0.1000    0.0004
   100        0.0484             nan     0.1000    0.0000
   150        0.0481             nan     0.1000   -0.0000
> prediction <- predict.gbm(object = GBM_model
+ ,newdata = testDescr
+ ,GBM_NTREES)
> hist(prediction)
> range(prediction)
[1] -0.02945224  1.00706700

Bernoulli: 伯努利:

GBM_model <- gbm.fit(
x = trainDescr 
,y = trainClass 
,distribution = "bernoulli"
,n.trees = GBM_NTREES
,shrinkage = GBM_SHRINKAGE
,interaction.depth = GBM_DEPTH
,n.minobsinnode = GBM_MINOBS
,verbose = TRUE)
prediction <- predict.gbm(object = GBM_model
+ ,newdata = testDescr
+ ,GBM_NTREES)
> hist(prediction)
> range(prediction)
[1] -4.699324  3.043440

And adaboost: 并且adaboost:

GBM_model <- gbm.fit(
x = trainDescr 
,y = trainClass 
,distribution = "adaboost"
,n.trees = GBM_NTREES
,shrinkage = GBM_SHRINKAGE
,interaction.depth = GBM_DEPTH
,n.minobsinnode = GBM_MINOBS
,verbose = TRUE)
> prediction <- predict.gbm(object = GBM_model
+ ,newdata = testDescr
+ ,GBM_NTREES)
> hist(prediction)
> range(prediction)
[1] -3.0374228  0.9323279

Am I doing something wrong, do I need to preProcess (scale, center) the data or do I need to go in and manually floor/cap the values with something like : 我做错了什么,我是否需要对数据进行预处理(缩放,居中),或者我是否需要进入并手动对值进行置/上限,例如:

prediction <- ifelse(prediction < 0, 0, prediction)
prediction <- ifelse(prediction > 1, 1, prediction)

From ?predict.gbm : 来自?predict.gbm

Returns a vector of predictions. 返回预测向量。 By default the predictions are on the scale of f(x). 默认情况下,预测的范围为f(x)。 For example, for the Bernoulli loss the returned value is on the log odds scale, poisson loss on the log scale, and coxph is on the log hazard scale. 例如,对于伯努利损失,返回值在对数优势等级上,对数尺度上的泊松损失,以及在对数危险等级上的考克斯。

If type="response" then gbm converts back to the same scale as the outcome. 如果type =“response”,则gbm将转换回与结果相同的比例。 Currently the only effect this will have is returning probabilities for bernoulli and expected counts for poisson. 目前唯一的影响是返回bernoulli的概率和泊松的预期计数。 For the other distributions "response" and "link" return the same. 对于其他发行版“响应”和“链接”返回相同。

So if you use distribution="bernoulli" , you need to transform the predicted values to rescale them to [0, 1]: p <- plogis(predict.gbm(model)) . 因此,如果使用distribution="bernoulli" ,则需要转换预测值以将它们重新缩放为[0,1]: p <- plogis(predict.gbm(model)) Using distribution="gaussian" is really for regression as opposed to classification, although I'm surprised that the predictions aren't in [0, 1]: my understanding is that gbm is still based on trees, so the predicted values shouldn't be able to go outside the values present in the training data. 使用distribution="gaussian"实际上是回归而不是分类,虽然我很惊讶预测不在[0,1]中:我的理解是gbm仍然基于树,所以预测值不应该'能够超出训练数据中存在的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM