在sklearn训练后是否必须再次使用fit（）？

Question

I am using LinearRegression() . 我正在使用LinearRegression() 。 Below you can see what I have already done to predict new features: 在下面，您可以看到我为预测新功能所做的事情：

    lm = LinearRegression()
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=say)
    lm.fit(X_train, y_train)
    lm.predict(X_test)
    scr = lm.score(X_test, y_test)
    lm.fit(X, y)
    pred = lm.predict(X_real)

Do I really need the line lm.fit(X, y) or can I just go without using it? 我是否真的需要lm.fit(X, y)行lm.fit(X, y)或者我可以不使用它就行吗？ Also, If I don't need to calculate accuracy, do you think the following approach is better instead using training and testing? 另外，如果我不需要计算准确性，那么您认为以下方法比使用培训和测试更好吗？ (In case I don't want to test): （以防我不想测试）：

    lm.fit(X, y)
    pred = lm.predict(X_real)

Even I am getting 0.997 accuraccy, the predicted value is not close or shifted. 即使我获得0.997的精度，预测值也不会接近或偏移。 Are there ways to make prediction more accurate? 有什么方法可以使预测更准确？

Answer 1

You don't need to fit multiple times for predicting a value by given features since your algorithm already learned your train set. 由于算法已经学习了训练集，因此您无需为了给定功能而多次拟合即可预测值。 Check the codes below. 检查以下代码。

# Split your data into train and test sets    
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=say)

# Teach your data to your algorithm with train set
lm = LinearRegression()
lm.fit(X_train, y_train)

# Now it can predict
pred = lm.predict(X_real)

# Use test set to see how accurate it predicts
scr = lm.score(X_test, y_test)

Answer 2

The reason you are getting almost 100% accuracy score is a data leakage , caused by the following line of code: 您获得几乎100％的准确性分数的原因是由于以下代码行引起的数据泄漏：

lm.fit(X, y)

in the line above you gave your model ALL the data and then you are testing prediction using the subset of data that your model has already seen. 在上面的行中，您为模型提供了所有数据，然后使用模型已经看到的数据子集测试预测。

This causes very high accuracy score for the already seen data, but usually it performs badly on the unseen data . 这会为已经看到的数据带来非常高的准确性得分，但通常会对未 看到的数据表现不佳。

When do you want / need to fit your model multiple times? 当你想/需要适合模型多次？

If you are getting a new training data and want to improve your model by training it against a new portion of data, then you may want to choose one of regression algorithm, supporting incremental-learning . 如果您要获取新的训练数据，并希望通过针对新数据部分对模型进行训练来改进模型，那么您可能希望选择一种回归算法，以支持增量学习。

In this case you will use model.partial_fit() method instead of model.fit() ... 在这种情况下，您将使用model.partial_fit（）方法而不是model.fit() ...

在sklearn训练后是否必须再次使用fit（）？

问题描述

2 个解决方案

解决方案1
2 2018-03-24 16:59:18

解决方案2
1 2018-03-25 11:01:06

在sklearn训练后是否必须再次使用fit（）？

问题描述

2 个解决方案

解决方案1 2 2018-03-24 16:59:18

解决方案2 1 2018-03-25 11:01:06

解决方案1
2 2018-03-24 16:59:18

解决方案2
1 2018-03-25 11:01:06