我如何保存需要更少 memory 的机器学习 model

Question

I am training a RandomForest Classifier with somewhat large dataset of around 580mb and it is taking more than 30 min to fit.我正在训练一个 RandomForest 分类器，它的数据集有些大，大约 580mb，需要 30 多分钟才能适应。 Now when I try to save the model using joblib then the saved model takes around 11.1gb of space.现在，当我尝试使用 joblib 保存 model 时，保存的 model 需要大约 11.1gb 的空间。 Is it normal or I can save the model more efficiently in terms of space consumed as I am thinking of deploying the model.这是正常的，还是我可以更有效地节省 model 在我正在考虑部署 model 时占用的空间。

Is it worth using a model that takes so much space because I have a decision tree model on same data that takes 278mb space and it's accuracy is just 2% lower(91%)是否值得使用占用如此多空间的 model，因为我有一个决策树 model 在相同的数据上占用 278mb 空间，它的准确性仅低 2%（91%）

my notebook 我的笔记本

This is model saving code这是 model 保存代码

from sklearn.externals import joblib  
# Save the model as a pickle in a file 
joblib.dump(Random_classifier, '/content/drive/My Drive/Random_classifier.pkl')

I am a newbie so don't vote to close the question just leave a comment.我是新手，所以不要投票结束问题，只需发表评论即可。 I am willing to edit the question asap.我愿意尽快编辑问题。

Answer 1

Random Forest classification method is way expensive in memory.随机森林分类方法在 memory 中非常昂贵。 try to lower your decision trees number, might reduce some memory.尝试降低您的决策树数量，可能会减少一些 memory。 It seems your dataset is also very big so I think it seems legit your weights size.看来您的数据集也很大，所以我认为您的权重大小似乎是合法的。 Also I know there is pickle way to save weights, I would recommend to check it out too.我也知道有泡菜的方法可以减轻重量，我也建议您检查一下。

我如何保存需要更少 memory 的机器学习 model

问题描述

1 个解决方案

解决方案1
0

我如何保存需要更少 memory 的机器学习 model

问题描述

1 个解决方案

解决方案1 0

解决方案1
0