简体   繁体   中英

Loading Pretrained Keras to Sagemaker - local classification works but sagemaker classification changes

EDIT: Found a solution, see bottom of post.

I have a pre-trained keras model (model.h5) which is a CNN for image classification. My goal is to deploy the model on sagemaker and use a lambda function to interface with the sagemaker endpoint and make predictions. When I predict with the model on my local machine using the following code, I get results I would expect:

model = load_model(r'model.h5')
photo_fp = r'/path/to/photo.jpg'

img = Image.open(photo_fp).resize((128,128))
image_array = np.array(img) / 255.
img_batch = np.expand_dims(image_array, axis=0)

print(model.predict(img_batch))
# [[9.9984562e-01 1.5430539e-04 2.2775747e-14 9.5851349e-16]]

However, when I deploy the model as an endpoint on sagemaker, I get different results. Below is my code to deploy the model as an endpoint:

model = load_model(r'model.h5')

import tensorflow as tf
from tensorflow import keras
import sagemaker
import boto3, re
from sagemaker import get_execution_role
def convert_h5_to_aws(loaded_model):
    # Interpreted from 'Data Liam'
    from tensorflow.python.saved_model import builder
    from tensorflow.python.saved_model.signature_def_utils import predict_signature_def
    from tensorflow.python.saved_model import tag_constants
    
    model_version = '1'
    export_dir = 'export/Servo/' + model_version
    
    # Build the Protocol Buffer SavedModel at 'export_dir'
    builder = builder.SavedModelBuilder(export_dir)
    
    # Create prediction signature to be used by TensorFlow Serving Predict API
    signature = predict_signature_def(
        inputs={"inputs": loaded_model.input}, outputs={"score": loaded_model.output})

    with tf.compat.v1.Session() as sess:
        init = tf.global_variables_initializer()
        sess.run(init)
        # Save the meta graph and variables
        builder.add_meta_graph_and_variables(
            sess=sess, tags=[tag_constants.SERVING], signature_def_map={"serving_default": signature})
        builder.save()
    
    #create a tarball/tar file and zip it
    import tarfile
    with tarfile.open('model.tar.gz', mode='w:gz') as archive:
        archive.add('export', recursive=True)
        
convert_h5_to_aws(model)

sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

!touch train.py # from notebook
# the (default) IAM role
role = get_execution_role()
framework_version = tf.__version__

# Create Sagemaker model
from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                                  role = role,
                                  framework_version = framework_version,
                                  entry_point = 'train.py')

predictor = sagemaker_model.deploy(initial_instance_count=1,
                                   instance_type='ml.m4.xlarge')

This deploys fine and saves as an endpoint. Then, I invoke the endpoint:

runtime = boto3.client('runtime.sagemaker')
endpoint_name = 'endpoint-name-for-stackoverflow'

img = Image.open(photo_fp).resize((128,128))
image_array = np.array(img) / 255.
img_batch = np.expand_dims(image_array, axis=0)
predictor = TensorFlowPredictor(endpoint_name)
result = predictor.predict(data=img_batch)
print(result)
# {'predictions': [[0.199595317, 0.322404563, 0.209394112, 0.268606]]}

As you can see, the classifier is predicting all of the outputs as nearly equal probabilities, which is not what was predicted on the local machine. This leads me to believe that something is going wrong in my deployment.

I have tried loading the model weights and json model structure to sagemaker rather than the entire h5 model but that yielded the same results. I also used invoke endpoint instead of the predictor API with the following code:

payload = json.dumps(img_batch.tolist())
response = runtime.invoke_endpoint(EndpointName=endpoint_name,
                                   ContentType='application/json',
                                   Body=payload)
result = json.loads(response['Body'].read().decode())
print(result)
# {'predictions': [[0.199595317, 0.322404563, 0.209394112, 0.268606]]}

But yet again, the same results.

Any ideas why I'm getting different results with the sagemaker than on my local machine with the same model? Thanks!

EDIT: Found a solution. The problem was with the TensorflowModel framework version argument. I changed the framework_version to '1.12' and installed version 1.12 in the Sagemaker Jupyter instance and retrained my model locally using TF 1.12. I'm not totally sure why this works but all of the blogs I found (eg this one ) used 1.12. Hope this helps.

For the benefit of community providing solution in answer section

The problem was with the TensorflowModel framework version argument. After changing the framework_version to 1.12 and installed version TF 1.12 in the Sagemaker Jupyter instance and retrained model locally using TF 1.12 got same results. (paraphrased from Peter Van Katwyk)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM