Change model file save location on AWS SageMaker Training Job

Question

I am trying to run custom python/sklearn sagemaker script on AWS, basically learning from these examples: https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_randomforest/Sklearn_on_SageMaker_end2end.ipynb

All works fine, if define the arguments, train the model and output the file:

parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))

# train the model...

joblib.dump(model, os.path.join(args.model_dir, "model.joblib"))

And call the job with:

aws_sklearn.fit({'train': 's3://path/to/train', 'test': 's3://path/to/test'}, wait=False)

In this case model gets stored on different auto-generated bucket, which I do not want. I want to get the output (.joblib file) in the same s3 bucket I took data from. So I add the parameter model-dir :

aws_sklearn.fit({'train': 's3://path/to/train', 'test': 's3://path/to/test', `model-dir`: 's3://path/to/model'}, wait=False)

But it results in error: FileNotFoundError: [Errno 2] No such file or directory: 's3://path/to/model/model.joblib'

Same happens if I hardcode the output path inside the training script.

So the main question, how can I get the output file in the bucket of my choice?

Answer 1

You can use parameter output_path when you define the estimator. If you use the model_dir I guess you have to create that bucket beforehand, but you have the advantage that artifacts can be saved in real time during the training (if the instance has rights on S3). You can take a look at my repo for this specific case.

Change model file save location on AWS SageMaker Training Job

Question

1 answers

solution1
2 ACCPTED 2021-01-13 13:47:25

Change model file save location on AWS SageMaker Training Job

Question

1 answers

solution1 2 ACCPTED 2021-01-13 13:47:25

solution1
2 ACCPTED 2021-01-13 13:47:25