如何在 kedro 中保存 keras model

Question

我能够在 s3 上以 h5 格式保存 DNN ZA559B87068921EEC05086CE5485E9784Z。 但是当我将它导入 kedro 工具的推理管道时，我变得空白？没有预测。 我在 catalog.yml 文件中做了以下更改：

model:
  filepath: s3://ds-kedro/cuisine-classification-model/06_models/model.h5
  layer: models
  type: kedro.extras.datasets.tensorflow.TensorFlowModelDataset

我在nodes.py中进行了如下更改：

    def train_model(multilabel_df: pd.DataFrame):
    """Use tokenizer to convert text to sequence and Use Deep Neural Network (DNN) to predict cuisines.
    Args: 
        feature_table: Contains restaurant names and cuisine code
    Returns:
        Model
    """
    tokenizer = Tokenizer(num_words=5000, lower=True)
    tokenizer.fit_on_texts(multilabel_df['detailed_name'])
    sequences = tokenizer.texts_to_sequences(multilabel_df['detailed_name'])
    x = pad_sequences(sequences, maxlen=200)
    X_train, X_test, y_train, y_test = train_test_split(x, 
                                                    
                         multilabel_df[multilabel_df.columns[1:]], 
                                                    test_size=0.1, 
                                                    random_state=42)
    num_classes = y_train.shape[1]
    max_words = len(tokenizer.word_index) + 1
    maxlen = 200
    model = Sequential()
    model.add(Embedding(max_words, 20, input_length=maxlen))
    model.add(GlobalMaxPool1D())
    model.add(Dense(num_classes, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', metrics=['acc'])
    history = model.fit(X_train, y_train,
                    epochs=1,
                    batch_size=32,
                    validation_split=0.3,
                    )
    metrics = model.evaluate(X_test, y_test)
    print("{}: {}".format(model.metrics_names[1], metrics[1]))
    print('Predicting....')
    y_pred = model.predict(X_test,verbose=1)
    metric = HammingLoss(mode='multilabel', threshold=0.5)
    metric.update_state(y_test, y_pred)
    print("Hamming Loss is:",metric.result().numpy())
    #model.save('model.h5')  # creates a HDF5 file 'my_model.h5'
    #return model
    return dict(
        model=model,
        model_history=history.history,
    )

我尝试了不同的方法，比如我将 model 放在 return 语句中，然后在推理管道中传递这个参数。

def inference_pipeline(model, inference_data):
    pipeline code

如果有人试图找出这里出了什么问题，那将是非常有帮助的，因为我没有收到错误但也没有得到任何预测（空白值）

Answer 1

您好@Rajesh，这是您应该通过pickle.PickleDataSet保存输出的地方

该数据集支持多个后端，默认为 cpickle - 但如果有帮助，您可以将其传递给其他后端，如joblib或dill 。

Answer 2

您始终可以使用.hd5格式将 Keras 模型保存在 Kedro 中。 您需要安装tensorflow.TensorFlowModelDataset数据集作为额外的数据集支持，使用

pip install kedro[<specify extra dataset>]

然后

在 catalog.yml 文件中添加一个规范：

your_model:
  type: tensorflow.TensorFlowModelDataset
  filepath: <path to save in local/s3>/your_model.hd5

您可以直接在推理管道中使用your_model进行预测。

如何在 kedro 中保存 keras model

问题描述

2 个解决方案

解决方案1
0 2022-02-17 10:10:23

解决方案2
0 2022-07-25 04:54:57

如何在 kedro 中保存 keras model

问题描述

2 个解决方案

解决方案1 0 2022-02-17 10:10:23

解决方案2 0 2022-07-25 04:54:57

解决方案1
0 2022-02-17 10:10:23

解决方案2
0 2022-07-25 04:54:57