简体   繁体   中英

R prediction within an interval

quick question on prediction.

The value I'm trying to predict is either 0 or 1 (it is set as numeric, not as a factor) so when I run my random forest:

fit <- randomForest(PredictValue ~ <variables>, data=trainData, ntree=50) 

and predict:

pred<-predict(fit, testData)

all my predictions are between 0 and 1 – which is what I expect and - I Imagine - can be interpreted as the probability of being 1.

Now, If I go through the same process using the gbm algorithm:

fitgbm <- gbm(PredictValue~ <variables>, data=trainData, distribution = "bernoulli", n.trees = 500,   bag.fraction = 0.75, cv.folds = 5, interaction.depth = 3)
predgbm <- predict(fitgbm, testData)

the values are from -0.5 to 0.5

I also tried glm and the range was worst, from around -3 to 3.

So, my question is: is it possible to set the algorithms to predict between 0 and 1?


You need to specify type='response' for this to happen:

Check this example:

y <- rep(c(0,1),c(100,100))
x <- runif(200)
df <- data.frame(y,x)

fitgbm <- gbm(y ~ x, data=df, 
              distribution = "bernoulli", n.trees = 100)

predgbm <- predict(fitgbm, df, n.trees=100, type='response')

Too simplistic but look at the summary of predgbm :

> summary(predgbm)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.4936  0.4943  0.5013  0.5000  0.5052  0.5073 

And as the documentation mentions this is the probability of y being 1:

If type="response" then gbm converts back to the same scale as the outcome. Currently the only effect this will have is returning probabilities for bernoulli and expected counts for poisson.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM