R - GBM | Data making trained GBM model very heavy

Question

We are using GBM model to train on really voluminous data ~15GB. The trained model size becomes huge ~17GB. In the trained model we see data saved along with trees and other model details, taking around 96% of total model size.

Is there any use of data in trained model, specifically for prediction purpose. We are saving the model and reloading the model while prediction, which is taking a very long time.

Answer 1

If you are using gbm library in R, then use gbm.fit and set keep.data = FALSE

label = as.numeric(iris$Species=="setosa")
trn = sample(nrow(iris),100)
fit = gbm.fit(x=iris[trn,-5],y=label[trn],shrinkage =0.1,keep.data = FALSE)

This fails because there's no data:

predict(fit,n.trees = 10,type="response")
Error in reconstructGBMdata(object) : 
  Cannot reconstruct data from gbm object. gbm() was called with keep.data=FALSE

You can do:

predict(fit,iris[,-5],10,type="response")
predict(fit,iris[-trn,-5],10,type="response")

R - GBM | Data making trained GBM model very heavy

Question

1 answers

solution1
0 2020-02-12 09:43:04

R - GBM | Data making trained GBM model very heavy

Question

1 answers

solution1 0 2020-02-12 09:43:04

solution1
0 2020-02-12 09:43:04