简体   繁体   English

使用 Pickle 加载保存的 model - 在加载的程序中完成 fit_transform 时出现错误

[英]Loading saved model using Pickle - getting error as fit_transform is done in loaded program

I have created the first program to train the algorithm and save it.我创建了第一个程序来训练算法并保存它。

Program 1程序 1

import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit 
from sklearn.impute import SimpleImputer  
from sklearn.tree import DecisionTreeRegressor # import for Decision Tree Algorithm
import pickle
from sklearn.preprocessing import StandardScaler

SourceData=pd.read_excel("ASML Stock Predict.xlsx") # Load the data into Pandas DataFrame
SourceData["Nasdaq Category"]=pd.cut(SourceData["Adj Close Nasdaq 100"],
                                     bins=[0., 4500, 5500, 6500, 7500,8500, 9500, 10500, np.inf],
                                     labels=[1, 2, 3, 4,5,6,7,8])

""" Split the data source into test and train subset """
split = StratifiedShuffleSplit(n_splits=1, test_size=0.01, random_state=42)
for train_index, test_index in split.split(SourceData, SourceData["Nasdaq Category"]):
    strat_train_set = SourceData.loc[train_index]  # stratfied train dataset with all columns in original source data 
    strat_test_set = SourceData.loc[test_index] #stratified test dataset with all columns in original source data

""" Drop the new Nasdaq Category Cloumn from the data source after the train and test subset is prepared"""
for set_ in (strat_train_set, strat_test_set): 
    set_.drop("Nasdaq Category", axis=1, inplace=True)

DataSource_train_independent= strat_train_set.drop(["Date", "Adj Close ASML"], axis=1) # Drop depedent variable from training dataset
DataSource_train_dependent=strat_train_set["Adj Close ASML"].copy() #  New dataframe with only independent variable value for training dataset



imputer = SimpleImputer(strategy="median") # declated imputer to fill the blank values with Median value of the variable
imputer.fit(DataSource_train_independent) # calulate the median for different independent variables

""" Scale the independent variables training set. No need to scale the dependent variable """
sc_X = StandardScaler()
X=sc_X.fit_transform(DataSource_train_independent.values) # scale the independent variables
X_test=sc_X.transform(testdata.values) # scale the independent variables for test data
##sc_y = StandardScaler()
y=DataSource_train_dependent # scaling is not required for dependent variable


"""Decision Tree Regressor """

tree_reg = DecisionTreeRegressor()
tree_reg.fit(X,y)

filename = 'DecisionTree_TrainedModel.sav'
pickle.dump(tree_reg, open(filename, 'wb'))

Program 2节目二

from sklearn.tree import DecisionTreeRegressor # import for Decision Tree Algorithm
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeRegressor # import for Decision Tree Algorithm
import pandas as pd

testdata=pd.read_excel("ASML Test  Stock Predict.xlsx") # Load the test data

sc_X = StandardScaler()
X_test=sc_X.transform(testdata.values) # scale the independent variables for test data



loaded_model = pickle.load(open('DecisionTree_TrainedModel.sav', 'rb'))
decision_predictions = loaded_model.predict(X_test) # Predict the value of dependent variable
print("The prediction by Decision Treemodel is " , decision_predictions )

As I have "fit_transform" in program 1 and saved the model, hence in the second program after loading the model I have only transformed the independent variables.由于我在程序 1 中有“fit_transform”并保存了 model,因此在加载 model 后的第二个程序中,我只转换了自变量。

I am getting the error message when running the second program "sklearn.exceptions.NotFittedError: This StandardScaler instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator."运行第二个程序时出现错误消息“sklearn.exceptions.NotFittedError:尚未安装此 StandardScaler 实例。在使用此估算器之前,请使用适当的 arguments 调用 'fit'。”

Please suggest.请建议。 As I understand that I only need to transform and not fit test independent variables.据我了解,我只需要转换而不适合测试自变量。

You have to pickle trained StandardScaler also:您还必须腌制训练有素的 StandardScaler:

# train and pickle
sc = StandardScaler()
X = sc.fit_transform(DataSource_train_independent.values)

tree_reg = DecisionTreeRegressor()
tree_reg.fit(X, y)

pickle.dump(sc, open('StandardScaler.pk', 'wb'))
pickle.dump(tree_reg, open('DecisionTree.pk', 'wb'))

# load and predict
sc = pickle.load(open('StandardScaler.pk', 'rb'))
model = pickle.load(open('DecisionTree.pk', 'rb'))

X_test = sc.transform(testdata.values)
predictions = model.predict(X_test)

The better approach is to wrap all the steps in the single pipeline :更好的方法是将所有步骤包装在单个管道中:

pipeline = Pipeline(steps=[('sc', StandardScaler()), 
                           ('tree_reg', DecisionTreeRegressor())])

pipeline.fit(X, y)
pipeline.predict(testdata.values)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在StandardScalar Fit_Transform上获取错误 - Getting Error on StandardScalar Fit_Transform 使用 fit_transform() 和 transform() - Using fit_transform() and transform() 将 fit_transform 与 OneHotEncoder 一起使用时出现 Memory 错误 - Memory error when using fit_transform with OneHotEncoder 具有 fit_transform 错误的列转换器 - Column Transformer with fit_transform error 使用 fit_transform 思想的归一化函数 - Normalization function using fit_transform idea PolynomialFeatures fit_transform给出值错误 - PolynomialFeatures fit_transform is giving Value error 在 piepline 中使用特征选择和 ML model 时,如何确保 sklearn piepline 应用 fit_transform 方法? - How to be sure that sklearn piepline applies fit_transform method when using feature selection and ML model in piepline? TypeError: fit_transform() 缺少 1 个必需的位置参数:'X' - 运行相同的代码并得到一个唯一的错误 - TypeError: fit_transform() missing 1 required positional argument: 'X' - Running identical code & getting a unique error 加载使用 joblib/pickle 保存的 ML 模型时出现问题 - Problem loading ML model saved using joblib/pickle 数据框fit_transform抛出错误,看似错误 - Dataframe fit_transform throwing error with seemingly incorrect error
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM