将预训练的 Keras 加载到 Sagemaker - 本地分类有效，但 sagemaker 分类发生变化

Question

EDIT: Found a solution, see bottom of post.编辑：找到解决方案，请参阅帖子底部。

I have a pre-trained keras model (model.h5) which is a CNN for image classification.我有一个预训练的 keras model (model.h5)，它是一个用于图像分类的 CNN。 My goal is to deploy the model on sagemaker and use a lambda function to interface with the sagemaker endpoint and make predictions.我的目标是在 sagemaker 上部署 model 并使用 lambda function 与 sagemaker 端点接口并进行预测。 When I predict with the model on my local machine using the following code, I get results I would expect:当我使用以下代码在本地机器上使用 model 进行预测时，我会得到预期的结果：

model = load_model(r'model.h5')
photo_fp = r'/path/to/photo.jpg'

img = Image.open(photo_fp).resize((128,128))
image_array = np.array(img) / 255.
img_batch = np.expand_dims(image_array, axis=0)

print(model.predict(img_batch))
# [[9.9984562e-01 1.5430539e-04 2.2775747e-14 9.5851349e-16]]

However, when I deploy the model as an endpoint on sagemaker, I get different results.但是，当我将 model 部署为 sagemaker 上的端点时，我得到了不同的结果。 Below is my code to deploy the model as an endpoint:下面是我将 model 部署为端点的代码：

model = load_model(r'model.h5')

import tensorflow as tf
from tensorflow import keras
import sagemaker
import boto3, re
from sagemaker import get_execution_role
def convert_h5_to_aws(loaded_model):
    # Interpreted from 'Data Liam'
    from tensorflow.python.saved_model import builder
    from tensorflow.python.saved_model.signature_def_utils import predict_signature_def
    from tensorflow.python.saved_model import tag_constants
    
    model_version = '1'
    export_dir = 'export/Servo/' + model_version
    
    # Build the Protocol Buffer SavedModel at 'export_dir'
    builder = builder.SavedModelBuilder(export_dir)
    
    # Create prediction signature to be used by TensorFlow Serving Predict API
    signature = predict_signature_def(
        inputs={"inputs": loaded_model.input}, outputs={"score": loaded_model.output})

    with tf.compat.v1.Session() as sess:
        init = tf.global_variables_initializer()
        sess.run(init)
        # Save the meta graph and variables
        builder.add_meta_graph_and_variables(
            sess=sess, tags=[tag_constants.SERVING], signature_def_map={"serving_default": signature})
        builder.save()
    
    #create a tarball/tar file and zip it
    import tarfile
    with tarfile.open('model.tar.gz', mode='w:gz') as archive:
        archive.add('export', recursive=True)
        
convert_h5_to_aws(model)

sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

!touch train.py # from notebook
# the (default) IAM role
role = get_execution_role()
framework_version = tf.__version__

# Create Sagemaker model
from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                                  role = role,
                                  framework_version = framework_version,
                                  entry_point = 'train.py')

predictor = sagemaker_model.deploy(initial_instance_count=1,
                                   instance_type='ml.m4.xlarge')

This deploys fine and saves as an endpoint.这可以很好地部署并保存为端点。 Then, I invoke the endpoint:然后，我调用端点：

runtime = boto3.client('runtime.sagemaker')
endpoint_name = 'endpoint-name-for-stackoverflow'

img = Image.open(photo_fp).resize((128,128))
image_array = np.array(img) / 255.
img_batch = np.expand_dims(image_array, axis=0)
predictor = TensorFlowPredictor(endpoint_name)
result = predictor.predict(data=img_batch)
print(result)
# {'predictions': [[0.199595317, 0.322404563, 0.209394112, 0.268606]]}

As you can see, the classifier is predicting all of the outputs as nearly equal probabilities, which is not what was predicted on the local machine.如您所见，分类器将所有输出预测为几乎相等的概率，这与本地机器上的预测不同。 This leads me to believe that something is going wrong in my deployment.这让我相信我的部署出现了问题。

I have tried loading the model weights and json model structure to sagemaker rather than the entire h5 model but that yielded the same results. I have tried loading the model weights and json model structure to sagemaker rather than the entire h5 model but that yielded the same results. I also used invoke endpoint instead of the predictor API with the following code:我还使用了调用端点而不是预测器 API，代码如下：

payload = json.dumps(img_batch.tolist())
response = runtime.invoke_endpoint(EndpointName=endpoint_name,
                                   ContentType='application/json',
                                   Body=payload)
result = json.loads(response['Body'].read().decode())
print(result)
# {'predictions': [[0.199595317, 0.322404563, 0.209394112, 0.268606]]}

But yet again, the same results.但是，同样的结果。

Any ideas why I'm getting different results with the sagemaker than on my local machine with the same model?任何想法为什么我使用 sagemaker 得到的结果与使用相同 model 的本地机器不同？ Thanks!谢谢！

EDIT: Found a solution.编辑：找到解决方案。 The problem was with the TensorflowModel framework version argument.问题出在 TensorflowModel 框架版本参数上。 I changed the framework_version to '1.12' and installed version 1.12 in the Sagemaker Jupyter instance and retrained my model locally using TF 1.12.我将 framework_version 更改为“1.12”并在 Sagemaker Jupyter 实例中安装了 1.12 版本，并使用 TF 1.12 在本地重新训练了我的 model。 I'm not totally sure why this works but all of the blogs I found (eg this one ) used 1.12.我不完全确定为什么会这样，但我发现的所有博客（例如这个）都使用了 1.12。 Hope this helps.希望这可以帮助。

Answer 1

For the benefit of community providing solution in answer section为了社区的利益，在答案部分提供解决方案

The problem was with the TensorflowModel framework version argument.问题出在TensorflowModel框架版本参数上。 After changing the framework_version to 1.12 and installed version TF 1.12 in the Sagemaker Jupyter instance and retrained model locally using TF 1.12 got same results.在将framework_version更改为1.12并在Sagemaker Jupyter实例中安装TF 1.12版本并使用TF 1.12在本地重新训练 model 后，得到了相同的结果。 (paraphrased from Peter Van Katwyk) （从彼得范卡特维克转述）

将预训练的 Keras 加载到 Sagemaker - 本地分类有效，但 sagemaker 分类发生变化

问题描述

1 个解决方案

解决方案1
0

将预训练的 Keras 加载到 Sagemaker - 本地分类有效，但 sagemaker 分类发生变化

问题描述

1 个解决方案

解决方案1 0

解决方案1
0