如何使用 mlflow.pyfunc.log_model() 使用 Keras 步骤记录 sklearn 管道？类型错误：无法pickle _thread.RLock 对象

Question

I would like to log into MlFlow a sklearn pipeline with a Keras step.我想使用sklearn步骤登录 MlFlow 一个sklearn管道。

The pipeline has 2 steps: a sklearn StandardScale and a Keras TensorFlow model.管道有 2 个步骤： sklearn StandardScale 和sklearn TensorFlow 模型。

I am using mlflow.pyfunc.log_model() as possible solution, but I am having this error:我使用 mlflow.pyfunc.log_model() 作为可能的解决方案，但我有这个错误：

TypeError: can't pickle _thread.RLock objects
--->   mlflow.pyfunc.log_model("test1", python_model=wrappedModel, signature=signature)

Here is my code:这是我的代码：

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import keras
from keras import layers, Input
from keras.wrappers.scikit_learn import KerasRegressor
import mlflow.pyfunc
from sklearn.pipeline import Pipeline
from mlflow.models.signature import infer_signature

#toy dataframe
df1 = pd.DataFrame([[1,2,3,4,5,6], [10,20,30,40,50,60],[100,200,300,400,500,600]] )

#create train test datasets
X_train, X_test = train_test_split(df1, random_state=42, shuffle=True)

#scale X_train
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_train_s = pd.DataFrame(X_train_s)

#wrap the keras model to use it inside of sklearn pipeline
def create_model(optimizer='adam', loss='mean_squared_error', s = X_train.shape[1]):
  input_layer = keras.Input(shape=(s,))
  # "encoded" is the encoded representation of the input
  encoded = layers.Dense(25, activation='relu')(input_layer)
  encoded = layers.Dense(2, activation='relu')(encoded)

  # "decoded" is the lossy reconstruction of the input
  decoded = layers.Dense(2, activation='relu')(encoded)
  decoded = layers.Dense(25, activation='relu')(encoded)
  decoded = layers.Dense(s, activation='linear')(decoded)
  
  model = keras.Model(input_layer, decoded)
  model.compile(optimizer, loss)
  return model

# wrap the model
model = KerasRegressor(build_fn=create_model, verbose=1)

# create the pipeline
pipe = Pipeline(steps=[
    ('scale', StandardScaler()),
    ('model',model)
])

#function to wrap the pipeline to be logged by mlflow
class SklearnModelWrapper(mlflow.pyfunc.PythonModel):
  def __init__(self, model):
    self.model = model
    
  def predict(self, context, model_input):
    return self.model.predict(model_input)[:,1]
  
  
mlflow.end_run()
with mlflow.start_run(run_name='test1'):

  #train the pipeline
  pipe.fit(X_train, X_train_s, model__epochs=2)
  
  #wrap the model for mlflow log
  wrappedModel = SklearnModelWrapper(pipe)

  # Log the model with a signature that defines the schema of the model's inputs and outputs. 
  signature = infer_signature(X_train, wrappedModel.predict(None, X_train))
  mlflow.pyfunc.log_model("test1", python_model=wrappedModel, signature=signature)

From what I googled, it seems like this type of error is related to concurrency of threads.从我搜索的内容来看，这种类型的错误似乎与线程的并发性有关。 It could be then related to the TensorFlow, since it distributes the code during the model training phase.然后它可能与 TensorFlow 相关，因为它在模型训练阶段分发代码。

However, the offending code line is after the training phase.但是，有问题的代码行是在训练阶段之后。 If I remove this line, the rest of the code works, which makes me think that it happens after the concurrency phase of the model training.如果我删除这一行，其余的代码就可以工作了，这让我认为它发生在模型训练的并发阶段之后。 I have no idea why I am getting this error in this context.我不知道为什么在这种情况下会出现此错误。 I am a beginner?我是初学者？ Can someone please help me?有人可以帮帮我吗？ Thanks谢谢

Answer 1

在python_model=wrappedModel应该是python_model=SklearnModelWrapper()我认为

如何使用 mlflow.pyfunc.log_model() 使用 Keras 步骤记录 sklearn 管道？类型错误：无法pickle _thread.RLock 对象

问题描述

1 个解决方案

解决方案1
0 2021-01-14 13:21:28

如何使用 mlflow.pyfunc.log_model() 使用 Keras 步骤记录 sklearn 管道？ 类型错误：无法pickle _thread.RLock 对象

问题描述

1 个解决方案

解决方案1 0 2021-01-14 13:21:28

如何使用 mlflow.pyfunc.log_model() 使用 Keras 步骤记录 sklearn 管道？类型错误：无法pickle _thread.RLock 对象

解决方案1
0 2021-01-14 13:21:28