简体   繁体   中英

R - GBM | Data making trained GBM model very heavy

We are using GBM model to train on really voluminous data ~15GB. The trained model size becomes huge ~17GB. In the trained model we see data saved along with trees and other model details, taking around 96% of total model size.

Is there any use of data in trained model, specifically for prediction purpose. We are saving the model and reloading the model while prediction, which is taking a very long time.

If you are using gbm library in R, then use gbm.fit and set keep.data = FALSE

label = as.numeric(iris$Species=="setosa")
trn = sample(nrow(iris),100)
fit = gbm.fit(x=iris[trn,-5],y=label[trn],shrinkage =0.1,keep.data = FALSE)

This fails because there's no data:

predict(fit,n.trees = 10,type="response")
Error in reconstructGBMdata(object) : 
  Cannot reconstruct data from gbm object. gbm() was called with keep.data=FALSE

You can do:

predict(fit,iris[,-5],10,type="response")
predict(fit,iris[-trn,-5],10,type="response")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM