简体   繁体   English

如何使用 tensorflow 或 keras 重新训练线性回归 model 和新子集?

[英]How to do retrain a linear regression model with a new subset using tensorflow or keras?

I have 100 Gb of data and divided it into small subsets.我有 100 Gb 的数据并将其分成小的子集。 I want to train the model in an incremental way using a new subset until all the algorithm is trained on all the subsets.我想使用新的子集以增量方式训练 model,直到所有算法都在所有子集上进行训练。 How I can achieve this TensorFlow or sklearn?我怎样才能实现这个 TensorFlow 或 sklearn?

Some scikit-learn models do support incremental learning through the partial_fit method.一些scikit-learn模型通过partial_fit方法支持增量学习。 A popular choice is the Stochastic Gradient Descent, which minimizes a loss function looking at one data sample at a time.一种流行的选择是随机梯度下降,它可以最大限度地减少一次查看一个数据样本的损失 function。 Here is an example, assuming you have two chunks of data that you can load successively to memory, (X1, y1), (X2, y2) .这是一个示例,假设您有两个可以连续加载到 memory, (X1, y1), (X2, y2)的数据块。

from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
sgd = SGDRegressor(random_state=42)

X1_scaled = scaler.partial_fit(X1).transform(X1)
sgd.partial_fit(X1_scaled, y1)

X2_scaled = scaler.partial_fit(X2).transform(X2)
sgd.partial_fit(X2_scaled, y2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM