如何使用 tensorflow 或 keras 重新训练线性回归 model 和新子集？

Question

I have 100 Gb of data and divided it into small subsets.我有 100 Gb 的数据并将其分成小的子集。 I want to train the model in an incremental way using a new subset until all the algorithm is trained on all the subsets.我想使用新的子集以增量方式训练 model，直到所有算法都在所有子集上进行训练。 How I can achieve this TensorFlow or sklearn?我怎样才能实现这个 TensorFlow 或 sklearn？

Answer 1

Some scikit-learn models do support incremental learning through the partial_fit method.一些scikit-learn模型通过partial_fit方法支持增量学习。 A popular choice is the Stochastic Gradient Descent, which minimizes a loss function looking at one data sample at a time.一种流行的选择是随机梯度下降，它可以最大限度地减少一次查看一个数据样本的损失 function。 Here is an example, assuming you have two chunks of data that you can load successively to memory, (X1, y1), (X2, y2) .这是一个示例，假设您有两个可以连续加载到 memory, (X1, y1), (X2, y2)的数据块。

from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
sgd = SGDRegressor(random_state=42)

X1_scaled = scaler.partial_fit(X1).transform(X1)
sgd.partial_fit(X1_scaled, y1)

X2_scaled = scaler.partial_fit(X2).transform(X2)
sgd.partial_fit(X2_scaled, y2)

如何使用 tensorflow 或 keras 重新训练线性回归 model 和新子集？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-12-14 10:49:07

如何使用 tensorflow 或 keras 重新训练线性回归 model 和新子集？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-12-14 10:49:07

解决方案1
1 已采纳 2020-12-14 10:49:07