简体   繁体   English

加载和预测新数据 sklearn

[英]Load and predict new data sklearn

I trained a Logistic model, cross-validated and saved it to file using joblib module.我训练了一个物流模型,交叉验证并使用 joblib 模块将其保存到文件中。 Now I want to load this model and predict new data with it.现在我想加载这个模型并用它预测新数据。 Is this the correct way to do this?这是正确的方法吗? Especially the standardization.尤其是标准化。 Should I use scaler.fit() on my new data too?我也应该在我的新数据上使用 scaler.fit() 吗? In the tutorials I followed, scaler.fit was only used on the training set, so I'm a bit lost here.在我遵循的教程中,scaler.fit 仅用于训练集,所以我在这里有点迷失。

Here is my code:这是我的代码:

#Loading the saved model with joblib
model = joblib.load('model.pkl')

# New data to predict
pr = pd.read_csv('set_to_predict.csv')
pred_cols = list(pr.columns.values)[:-1]

# Standardize new data
scaler = StandardScaler()
X_pred = scaler.fit(pr[pred_cols]).transform(pr[pred_cols])

pred = pd.Series(model.predict(X_pred))
print pred

No, it's incorrect.不,这是不正确的。 All the data preparation steps should be fit using train data.所有数据准备步骤都应使用训练数据进行拟合。 Otherwise, you risk applying the wrong transformations, because means and variances that StandardScaler estimates do probably differ between train and test data.否则,您可能会应用错误的转换,因为StandardScaler估计的均值和方差在训练和测试数据之间可能会有所不同。

The easiest way to train, save, load and apply all the steps simultaneously is to use Pipelines:同时训练、保存、加载和应用所有步骤的最简单方法是使用流水线:

At training :训练时

# prepare the pipeline
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib

pipe = make_pipeline(StandardScaler(), LogisticRegression)
pipe.fit(X_train, y_train)
joblib.dump(pipe, 'model.pkl')

At prediction :在预测

#Loading the saved model with joblib
pipe = joblib.load('model.pkl')

# New data to predict
pr = pd.read_csv('set_to_predict.csv')
pred_cols = list(pr.columns.values)[:-1]

# apply the whole pipeline to data
pred = pd.Series(pipe.predict(pr[pred_cols]))
print pred

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM