I would like to log into MlFlow a sklearn
pipeline with a Keras step.
The pipeline has 2 steps: a sklearn
StandardScale and a Keras TensorFlow model.
I am using mlflow.pyfunc.log_model() as possible solution, but I am having this error:
TypeError: can't pickle _thread.RLock objects
---> mlflow.pyfunc.log_model("test1", python_model=wrappedModel, signature=signature)
Here is my code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import keras
from keras import layers, Input
from keras.wrappers.scikit_learn import KerasRegressor
import mlflow.pyfunc
from sklearn.pipeline import Pipeline
from mlflow.models.signature import infer_signature
#toy dataframe
df1 = pd.DataFrame([[1,2,3,4,5,6], [10,20,30,40,50,60],[100,200,300,400,500,600]] )
#create train test datasets
X_train, X_test = train_test_split(df1, random_state=42, shuffle=True)
#scale X_train
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_train_s = pd.DataFrame(X_train_s)
#wrap the keras model to use it inside of sklearn pipeline
def create_model(optimizer='adam', loss='mean_squared_error', s = X_train.shape[1]):
input_layer = keras.Input(shape=(s,))
# "encoded" is the encoded representation of the input
encoded = layers.Dense(25, activation='relu')(input_layer)
encoded = layers.Dense(2, activation='relu')(encoded)
# "decoded" is the lossy reconstruction of the input
decoded = layers.Dense(2, activation='relu')(encoded)
decoded = layers.Dense(25, activation='relu')(encoded)
decoded = layers.Dense(s, activation='linear')(decoded)
model = keras.Model(input_layer, decoded)
model.compile(optimizer, loss)
return model
# wrap the model
model = KerasRegressor(build_fn=create_model, verbose=1)
# create the pipeline
pipe = Pipeline(steps=[
('scale', StandardScaler()),
('model',model)
])
#function to wrap the pipeline to be logged by mlflow
class SklearnModelWrapper(mlflow.pyfunc.PythonModel):
def __init__(self, model):
self.model = model
def predict(self, context, model_input):
return self.model.predict(model_input)[:,1]
mlflow.end_run()
with mlflow.start_run(run_name='test1'):
#train the pipeline
pipe.fit(X_train, X_train_s, model__epochs=2)
#wrap the model for mlflow log
wrappedModel = SklearnModelWrapper(pipe)
# Log the model with a signature that defines the schema of the model's inputs and outputs.
signature = infer_signature(X_train, wrappedModel.predict(None, X_train))
mlflow.pyfunc.log_model("test1", python_model=wrappedModel, signature=signature)
From what I googled, it seems like this type of error is related to concurrency of threads. It could be then related to the TensorFlow, since it distributes the code during the model training phase.
However, the offending code line is after the training phase. If I remove this line, the rest of the code works, which makes me think that it happens after the concurrency phase of the model training. I have no idea why I am getting this error in this context. I am a beginner? Can someone please help me? Thanks
在python_model=wrappedModel
应该是python_model=SklearnModelWrapper()
我认为
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.