简体   繁体   English

将配置文件传递给 Sagemaker 训练程序

[英]Pass a config file to Sagemaker training program

Setup:设置:

I have gone for the bring your own container option for AWS Sagemaker Training.我已经为 AWS Sagemaker Training 选择了自带容器选项。 In the Dockerfile, I specify the SAGEMAKER_PROGRAM variable to point to tools/train.py as I am working with mmaction2 repo.在 Dockerfile 中,我指定SAGEMAKER_PROGRAM变量指向tools/train.py ,因为我正在使用 mmaction2 存储库。

So a user is executing所以用户正在执行

estimator = PyTorch(
    role='sagemaker_role',
    image_uri="path_in_ecr",
    instance_count=1,
    instance_type="ml.g4dn.xlarge",
    volume_size=40,
    output_path=f"s3://{bucket}/{prefix_output}/",
    sagemaker_session=sagemaker_session,
    max_run=3600 * 2,
)

estimator.fit()

on an ec2 machine where say they have a config in /home/ubuntu/train_config_mmaction2.py在 ec2 机器上说他们在/home/ubuntu/train_config_mmaction2.py中有一个配置

Problem: Since mmaction2 requires a config file as input which specifies the training config, how can I pass a file to Sagemaker Training so that it is copied from the calling ec2 instance to the training instance and used as a command line argument for the SAGEMAKER_PROGRAM defined in the Dockerfile?问题:由于 mmaction2 需要一个配置文件作为指定训练配置的输入,我如何将文件传递给 Sagemaker Training,以便将其从调用 ec2 实例复制到训练实例,并用作定义的SAGEMAKER_PROGRAM的命令行参数在 Dockerfile 中?

I tried using the entrypoint and source_code argument provided in the pytorch class where the entrypoint and the config is in the source_code directory so that the config would be copied.我尝试使用 pytorch class 中提供的entrypoint点和source_code代码参数,其中入口点和配置位于源代码目录中,以便复制配置。 However, this creates a dependency on have the entrypoint present locally for each run.但是,这会产生对每次运行都在本地存在入口点的依赖性。 I am wondering if there is a way to do this without having this dependency我想知道是否有办法在没有这种依赖的情况下做到这一点

Hey you can do multiple things:嘿,你可以做很多事情:

  1. Either have the config file in the source_dir , along with the entry point.source_dir中包含配置文件以及入口点。 This doesn't have to be local, it can also come from a git repo, as indicated here: blog , demo这不一定是本地的,它也可以来自 git 存储库,如下所示: 博客演示
  2. Or you could bring the config file via S3, using SageMaker input or checkpoint channels ( doc )或者您可以使用 SageMaker 输入或检查点通道 ( doc ) 通过 S3 引入配置文件

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AWS SageMaker 训练脚本:如何传递自定义用户参数 - AWS SageMaker training script: how to pass custom user parameters Sagemaker - 分布式训练 - Sagemaker - Distributed training 从 AWS SageMaker Pipeline 训练组件中的训练脚本将自定义文件上传到 s3 - Upload custom file to s3 from training script in training component of AWS SageMaker Pipeline Sagemaker 培训作业失败“”FileNotFoundError:[Errno 2] 没有这样的文件或目录:'/opt/ml/input/data/training/annotations.json'” - Sagemaker training job fails ""FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/training/annotations.json'" AWS sagemaker-container:如何创建 resourceconfig.json 或将其传递给培训框架? - AWS sagemaker-container: How to create or pass the resourceconfig.json to framework for training? AWS Sagemaker 多项训练作业 - AWS Sagemaker Multiple Training Jobs Sagemaker 培训工作致命错误:无法打开文件“train”:没有这样的文件或目录 - Sagemaker training job Fatal error: cannot open file 'train': No such file or directory 在 AWS Sagemaker 中训练 scikit 学习模型时无法创建 model.tar.gz 文件 - Couldn't create model.tar.gz file while training scikit learn model in AWS Sagemaker Sagemaker:只读文件系统:/opt/ml/models/../config.json 调用端点时 - Sagemaker: read-only file system: /opt/ml/models/../config.json when invoking endpoint amazon sagemaker 自定义代码增量训练 - incremental training on custom code in amazon sagemaker
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM