简体   繁体   中英

When using a Tensorflow Estimator in AWS Sagemaker, will the training job automatically save the model artifacts to /opt/ml/model?

I am trying to train a Tensorflow Estimator and upload the created model artifacts to S3. The training job completed successfully, but we get a warning saying "No model artifact is saved under path /opt/ml/model. Your training job will not save any model files to S3." This becomes an issue when we are trying to deploy the model in SageMaker. My understanding is that the Estimator object automatically saves the model artifacts to /opt/ml/model.

The main idea is to do training, deployment, and testing all through sagemaker. We have an entry.py script, a .py training script, and the actual sagemaker notebook that instantiates the Estimator object

#BEGINNING OF TRAINING SCRIPT
# my_custom_code_3.py
import tensorflow as tf
import numpy as np

# import sagemaker
# from sagemaker.predictor import csv_serializer
# from pyathena import connect
# from pyathena.pandas_cursor import PandasCursor

def train_model(init_learn):

    b = tf.Variable([.3], tf.float32, name="b")
    W = tf.Variable([-.3], tf.float32, name="w")
    x = tf.placeholder(tf.float32, name="x")
    y = tf.placeholder(tf.float32, name="y")

    X_train = [4, 0, 12]
    Y_train = [5, 9, -3]

    linear_model = W*x + b   # y = W*x + b; 5= -1*4 + 9; 9=1*0 + 
    9;  -3 = -1*12 + 9

    model_delta = tf.square(linear_model - y)
    loss = tf.reduce_sum(model_delta)
    optimizer = 
    tf.train.GradientDescentOptimizer(init_learn).minimize(loss)
    init = tf.global_variables_initializer()

    with tf.Session() as sess:
        sess.run(init)
        for i in range(1000):
            feed_dict_batch = {x: X_train, y: Y_train}
            sess.run(optimizer, feed_dict=feed_dict_batch)

        W_value, b_value = sess.run([W, b])
        print(W_value)
        print(b_value)

if __name__ == '__main__':
    train_model(0.001)
# END OF TRAINING SCRIPT


# BEGINNING OF ENTRY.PY
# entry.py
import argparse, os
from my_custom_code_3 import train_model

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument(
        '--model_dir',
        type=str,
        default=os.environ['SM_MODEL_DIR'])
    parser.add_argument(
        '--init_learn',
        type=float)
    args = parser.parse_args()
    train_model(args.init_learn)
# END OF ENTRY.PY

# BEGINNING OF SAGEMAKER_RUN3.ipynb
import sagemaker
from sagemaker.tensorflow import TensorFlow

hyperparameters = {
    'init_learn': 0.001
    #'model_dir': 's3://stats-interns/tfmodel'
}

role = sagemaker.get_execution_role()
source_dir = 's3://stats-interns/my-test-3/my-test-3.tar.gz'
estimator = TensorFlow(
    entry_point='entry-3.py',
    source_dir=source_dir,
    model_dir='s3://tfmodel',
    #change to local to test locally first, install docker-compose
    train_instance_type='ml.m5.large',
    train_instance_count=1,
    hyperparameters=hyperparameters,
    role=role,
    py_version='py3',
    framework_version='1.12.0',
    script_mode=True)
estimator.fit()

I expected the artifacts to be saved to /opt/ml/model then uploaded to the specified s3 directory, s3://tfmodel, but instead, nothing is copied and I get the warning message "No model artifact is saved under path /opt/ml/model. Your training job will not save any model files to S3."

At the end of training, SageMaker copies whatever it founds in /opt/ml/model to s3. So in order to get your model saved, you need to save it yourself at /opt/ml/model first. For example:

tf.contrib.saved_model.save_keras_model(model, /opt/ml/model)

The Estimator doesn't save the model, you have do it :) You also need to make sure that you save the model in the the right place. With script mode, SageMaker passes the output location to your code in os.environ['SM_MODEL_DIR'], so just use that value and you'll be fine.

If you want to deploy with the SageMaker Tensorflow container, your model needs to be in Tensorflow Serving format. This should to the trick:

parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
model_dir  = args.model_dir
...
sess = K.get_session()
tf.saved_model.simple_save(
        tf.Session(),
        os.path.join(model_dir, 'model/1'),
        inputs={'inputs': model.input},
        outputs={t.name: t for t in model.outputs})

Plenty of examples here: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM