简体   繁体   English

是否多次拟合sklearn线性回归分类器添加数据点或只是替换它们?

[英]Does fitting a sklearn Linear Regression classifier multiple times add data points or just replace them?

X = np.array(df.drop([label], 1))
X_lately = X[-forecast_out:]
X = X[:-forecast_out]
df.dropna(inplace=True)
y = np.array(df[label])

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)

linReg.fit(X_train, y_train)

I've been fitting my linear regression classifier over and over again with data from different spreadsheets under the assumption that every time I fit the same model with a new spreadsheet, it is adding points and making the model more robust. 我一直在使用来自不同电子表格的数据来反复拟合我的线性回归分类器,假设每次我使用新的电子表格拟合相同的模型时,它都会添加点并使模型更加健壮。

Was this assumption correct? 这个假设是否正确? Or am I just wiping the model every time I fit it? 或者我只是在每次适合时擦拭模型?

If so, is there a way for me to fit my model multiple times for this 'cumulative' type effect? 如果是这样,我有没有办法让我的模型多次适应这种“累积”类型效果?

Linear regression is a batch (aka. offline) training method, you can't add knowledge with new patterns. 线性回归是一种批处理(又称离线)训练方法,您无法使用新模式添加知识。 So, sklearn is re-fitting the whole model. 因此,sklearn正在重新拟合整个模型。 The only way to add data is to append the new patterns to your original training X, Y matrices and re-fit. 添加数据的唯一方法是将新模式附加到原始训练X, Y矩阵并重新拟合。

You're almost certainly wiping your mode land starting from scratch. 你几乎肯定会从头开始擦拭模式。 To do what you want, you need to append the additional data to the bottom of your data frame and re-fit using that. 要执行您想要的操作,您需要将附加数据附加到数据框的底部并使用它重新拟合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM