简体   繁体   中英

Pass a config file to Sagemaker training program

Setup:

I have gone for the bring your own container option for AWS Sagemaker Training. In the Dockerfile, I specify the SAGEMAKER_PROGRAM variable to point to tools/train.py as I am working with mmaction2 repo.

So a user is executing

estimator = PyTorch(
    role='sagemaker_role',
    image_uri="path_in_ecr",
    instance_count=1,
    instance_type="ml.g4dn.xlarge",
    volume_size=40,
    output_path=f"s3://{bucket}/{prefix_output}/",
    sagemaker_session=sagemaker_session,
    max_run=3600 * 2,
)

estimator.fit()

on an ec2 machine where say they have a config in /home/ubuntu/train_config_mmaction2.py

Problem: Since mmaction2 requires a config file as input which specifies the training config, how can I pass a file to Sagemaker Training so that it is copied from the calling ec2 instance to the training instance and used as a command line argument for the SAGEMAKER_PROGRAM defined in the Dockerfile?

I tried using the entrypoint and source_code argument provided in the pytorch class where the entrypoint and the config is in the source_code directory so that the config would be copied. However, this creates a dependency on have the entrypoint present locally for each run. I am wondering if there is a way to do this without having this dependency

Hey you can do multiple things:

  1. Either have the config file in the source_dir , along with the entry point. This doesn't have to be local, it can also come from a git repo, as indicated here: blog , demo
  2. Or you could bring the config file via S3, using SageMaker input or checkpoint channels ( doc )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM