如何使用 mlflow.pyfunc.log_model() 使用 Keras 步驟記錄 sklearn 管道？類型錯誤：無法pickle _thread.RLock 對象

Question

我想使用sklearn步驟登錄 MlFlow 一個sklearn管道。

管道有 2 個步驟： sklearn StandardScale 和sklearn TensorFlow 模型。

我使用 mlflow.pyfunc.log_model() 作為可能的解決方案，但我有這個錯誤：

TypeError: can't pickle _thread.RLock objects
--->   mlflow.pyfunc.log_model("test1", python_model=wrappedModel, signature=signature)

這是我的代碼：

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import keras
from keras import layers, Input
from keras.wrappers.scikit_learn import KerasRegressor
import mlflow.pyfunc
from sklearn.pipeline import Pipeline
from mlflow.models.signature import infer_signature

#toy dataframe
df1 = pd.DataFrame([[1,2,3,4,5,6], [10,20,30,40,50,60],[100,200,300,400,500,600]] )

#create train test datasets
X_train, X_test = train_test_split(df1, random_state=42, shuffle=True)

#scale X_train
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_train_s = pd.DataFrame(X_train_s)

#wrap the keras model to use it inside of sklearn pipeline
def create_model(optimizer='adam', loss='mean_squared_error', s = X_train.shape[1]):
  input_layer = keras.Input(shape=(s,))
  # "encoded" is the encoded representation of the input
  encoded = layers.Dense(25, activation='relu')(input_layer)
  encoded = layers.Dense(2, activation='relu')(encoded)

  # "decoded" is the lossy reconstruction of the input
  decoded = layers.Dense(2, activation='relu')(encoded)
  decoded = layers.Dense(25, activation='relu')(encoded)
  decoded = layers.Dense(s, activation='linear')(decoded)
  
  model = keras.Model(input_layer, decoded)
  model.compile(optimizer, loss)
  return model

# wrap the model
model = KerasRegressor(build_fn=create_model, verbose=1)

# create the pipeline
pipe = Pipeline(steps=[
    ('scale', StandardScaler()),
    ('model',model)
])

#function to wrap the pipeline to be logged by mlflow
class SklearnModelWrapper(mlflow.pyfunc.PythonModel):
  def __init__(self, model):
    self.model = model
    
  def predict(self, context, model_input):
    return self.model.predict(model_input)[:,1]
  
  
mlflow.end_run()
with mlflow.start_run(run_name='test1'):

  #train the pipeline
  pipe.fit(X_train, X_train_s, model__epochs=2)
  
  #wrap the model for mlflow log
  wrappedModel = SklearnModelWrapper(pipe)

  # Log the model with a signature that defines the schema of the model's inputs and outputs. 
  signature = infer_signature(X_train, wrappedModel.predict(None, X_train))
  mlflow.pyfunc.log_model("test1", python_model=wrappedModel, signature=signature)

從我搜索的內容來看，這種類型的錯誤似乎與線程的並發性有關。 然后它可能與 TensorFlow 相關，因為它在模型訓練階段分發代碼。

但是，有問題的代碼行是在訓練階段之后。 如果我刪除這一行，其余的代碼就可以工作了，這讓我認為它發生在模型訓練的並發階段之后。 我不知道為什么在這種情況下會出現此錯誤。 我是初學者？ 有人可以幫幫我嗎？ 謝謝

Answer 1

在python_model=wrappedModel應該是python_model=SklearnModelWrapper()我認為

如何使用 mlflow.pyfunc.log_model() 使用 Keras 步驟記錄 sklearn 管道？類型錯誤：無法pickle _thread.RLock 對象

問題描述

1 個解決方案

解決方案1
0 2021-01-14 13:21:28

如何使用 mlflow.pyfunc.log_model() 使用 Keras 步驟記錄 sklearn 管道？ 類型錯誤：無法pickle _thread.RLock 對象

問題描述

1 個解決方案

解決方案1 0 2021-01-14 13:21:28

如何使用 mlflow.pyfunc.log_model() 使用 Keras 步驟記錄 sklearn 管道？類型錯誤：無法pickle _thread.RLock 對象

解決方案1
0 2021-01-14 13:21:28