[英]How to log a sklearn pipeline with a Keras step using mlflow.pyfunc.log_model()? TypeError: can't pickle _thread.RLock objects
I would like to log into MlFlow a sklearn
pipeline with a Keras step.我想使用
sklearn
步骤登录 MlFlow 一个sklearn
管道。
The pipeline has 2 steps: a sklearn
StandardScale and a Keras TensorFlow model.管道有 2 个步骤:
sklearn
StandardScale 和sklearn
TensorFlow 模型。
I am using mlflow.pyfunc.log_model() as possible solution, but I am having this error:我使用 mlflow.pyfunc.log_model() 作为可能的解决方案,但我有这个错误:
TypeError: can't pickle _thread.RLock objects
---> mlflow.pyfunc.log_model("test1", python_model=wrappedModel, signature=signature)
Here is my code:这是我的代码:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import keras
from keras import layers, Input
from keras.wrappers.scikit_learn import KerasRegressor
import mlflow.pyfunc
from sklearn.pipeline import Pipeline
from mlflow.models.signature import infer_signature
#toy dataframe
df1 = pd.DataFrame([[1,2,3,4,5,6], [10,20,30,40,50,60],[100,200,300,400,500,600]] )
#create train test datasets
X_train, X_test = train_test_split(df1, random_state=42, shuffle=True)
#scale X_train
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_train_s = pd.DataFrame(X_train_s)
#wrap the keras model to use it inside of sklearn pipeline
def create_model(optimizer='adam', loss='mean_squared_error', s = X_train.shape[1]):
input_layer = keras.Input(shape=(s,))
# "encoded" is the encoded representation of the input
encoded = layers.Dense(25, activation='relu')(input_layer)
encoded = layers.Dense(2, activation='relu')(encoded)
# "decoded" is the lossy reconstruction of the input
decoded = layers.Dense(2, activation='relu')(encoded)
decoded = layers.Dense(25, activation='relu')(encoded)
decoded = layers.Dense(s, activation='linear')(decoded)
model = keras.Model(input_layer, decoded)
model.compile(optimizer, loss)
return model
# wrap the model
model = KerasRegressor(build_fn=create_model, verbose=1)
# create the pipeline
pipe = Pipeline(steps=[
('scale', StandardScaler()),
('model',model)
])
#function to wrap the pipeline to be logged by mlflow
class SklearnModelWrapper(mlflow.pyfunc.PythonModel):
def __init__(self, model):
self.model = model
def predict(self, context, model_input):
return self.model.predict(model_input)[:,1]
mlflow.end_run()
with mlflow.start_run(run_name='test1'):
#train the pipeline
pipe.fit(X_train, X_train_s, model__epochs=2)
#wrap the model for mlflow log
wrappedModel = SklearnModelWrapper(pipe)
# Log the model with a signature that defines the schema of the model's inputs and outputs.
signature = infer_signature(X_train, wrappedModel.predict(None, X_train))
mlflow.pyfunc.log_model("test1", python_model=wrappedModel, signature=signature)
From what I googled, it seems like this type of error is related to concurrency of threads.从我搜索的内容来看,这种类型的错误似乎与线程的并发性有关。 It could be then related to the TensorFlow, since it distributes the code during the model training phase.
然后它可能与 TensorFlow 相关,因为它在模型训练阶段分发代码。
However, the offending code line is after the training phase.但是,有问题的代码行是在训练阶段之后。 If I remove this line, the rest of the code works, which makes me think that it happens after the concurrency phase of the model training.
如果我删除这一行,其余的代码就可以工作了,这让我认为它发生在模型训练的并发阶段之后。 I have no idea why I am getting this error in this context.
我不知道为什么在这种情况下会出现此错误。 I am a beginner?
我是初学者? Can someone please help me?
有人可以帮帮我吗? Thanks
谢谢
在python_model=wrappedModel
应该是python_model=SklearnModelWrapper()
我认为
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.