简体   繁体   English

更改 AWS SageMaker 训练作业上的 model 文件保存位置

[英]Change model file save location on AWS SageMaker Training Job

I am trying to run custom python/sklearn sagemaker script on AWS, basically learning from these examples: https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_randomforest/Sklearn_on_SageMaker_end2end.ipynb我正在尝试在 AWS 上运行自定义 python/sklearn sagemaker 脚本,基本上从这些示例中学习: https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_randomforest/Sklearn_on_SageMaker_end2end。 ipynb

All works fine, if define the arguments, train the model and output the file:一切正常,如果定义 arguments,训练 model 和 output 文件:

parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))

# train the model...

joblib.dump(model, os.path.join(args.model_dir, "model.joblib"))

And call the job with:并通过以下方式调用工作:

aws_sklearn.fit({'train': 's3://path/to/train', 'test': 's3://path/to/test'}, wait=False)

In this case model gets stored on different auto-generated bucket, which I do not want.在这种情况下,model 存储在不同的自动生成的存储桶中,这是我不想要的。 I want to get the output (.joblib file) in the same s3 bucket I took data from.我想在我从中获取数据的同一个 s3 存储桶中获取 output(.joblib 文件)。 So I add the parameter model-dir :所以我添加了参数model-dir

aws_sklearn.fit({'train': 's3://path/to/train', 'test': 's3://path/to/test', `model-dir`: 's3://path/to/model'}, wait=False)

But it results in error: FileNotFoundError: [Errno 2] No such file or directory: 's3://path/to/model/model.joblib'但它会导致错误: FileNotFoundError: [Errno 2] No such file or directory: 's3://path/to/model/model.joblib'

Same happens if I hardcode the output path inside the training script.如果我在训练脚本中硬编码 output 路径,也会发生同样的情况。

So the main question, how can I get the output file in the bucket of my choice?所以主要问题是,如何在我选择的存储桶中获取 output 文件?

You can use parameter output_path when you define the estimator.您可以在定义估算器时使用参数output_path If you use the model_dir I guess you have to create that bucket beforehand, but you have the advantage that artifacts can be saved in real time during the training (if the instance has rights on S3).如果您使用model_dir ,我想您必须事先创建该存储桶,但您的优势是可以在训练期间实时保存工件(如果实例在 S3 上拥有权限)。 You can take a look at my repo for this specific case.您可以查看我针对此特定案例的存储库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在AWS Sagemaker中使用Tensorflow Estimator时,培训工作是否会自动将模型工件保存到/ opt / ml / model? - When using a Tensorflow Estimator in AWS Sagemaker, will the training job automatically save the model artifacts to /opt/ml/model? 指定模型输出的S3位置,以进行sagemaker培训作业重复问题 - Specify S3 location for model output to sagemaker training job duplication issue 如何在使用 aws Sagemaker python SDK 时保存训练作业的未压缩输出? - how to save uncompressed outputs from a training job in using aws Sagemaker python SDK? 在 aws sagemaker 中使用外部库进行模型训练 - Using external libraries for model training in aws sagemaker 在 AWS Sagemaker 中训练 scikit 学习模型时无法创建 model.tar.gz 文件 - Couldn't create model.tar.gz file while training scikit learn model in AWS Sagemaker Sagemaker 中的培训工作正在停止 - Training Job is Stopping in Sagemaker AWS SageMaker,使用 python SDK 描述特定的训练作业 - AWS SageMaker, describe a specific training job using python SDK SageMaker 的 FensorFlow 培训作业 - 如何更改脚本存档路径? - SageMaker's FensorFlow Training Job - how to change script archive path? Sagemaker 培训作业未上传/保存培训 Model 到 S3 Output 路径 - Sagemaker Training Job Not Uploading/Saving Training Model to S3 Output Path AWS Sagemaker 多项训练作业 - AWS Sagemaker Multiple Training Jobs
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM