简体   繁体   中英

Upload custom file to s3 from training script in training component of AWS SageMaker Pipeline

I am new to Sagmaker, and I have created a pipeline from the SageMaker notebook, consisting of training and deployment components. In the training script, we can upload the model to s3 via SM_MODEL_DIR . But now, I want to upload the classification report to s3. I tried this code. But It shows this is not a proper s3 bucket.

df_classification_report = pd.DataFrame(class_report).transpose()
classification_report_file_name = os.path.join(args.output_data_dir,
                                               f"{args.eval_model_name}_classification_report.csv")
df_classification_report.to_csv(classification_report_file_name)
# instantiate S3 client and upload to s3

# save classification report to s3
s3 = boto3.resource('s3')
print(f"classification_report is being uploaded to s3- {args.model_dir}")
s3.meta.client.upload_file(classification_report_file_name, args.model_dir,
                            f"{args.eval_model_name}_classification_report.csv")

And the error

Invalid bucket name "/opt/ml/output/data": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"

Can anybody help? I really appreciate any help you can provide.

SageMaker Training Jobs will compress any files located in /opt/ml/model which is the value of SM_MODEL_DIR and upload it to S3 automatically. You could look at saving your file to SM_MODEL_DIR (Your classification report will thus be uploaded to S3 in the model tar ball).

The upload_file() function requires you to pass an S3 bucket. You could also look at manually specify an S3 bucket in your code to upload the file to.

s3.meta.client.upload_file(classification_report_file_name, <YourS3Bucket>,
                            f"{args.eval_model_name}_classification_report.csv")

You can save non model artifacts, such as reports, to output_data_dir . See here .

parser.add_argument("--output_data_dir", type=str, 
default=os.environ.get('SM_OUTPUT_DATA_DIR'),                              
     help="Directory to save output data artifacts.")

If you want the artifacts to be packaged with the model files then follow @Marc's answer. Maybe it makes sense in the case of a report that pertains to a specific model, though capturing this in a model registry makes more sense to me.
Note that these additional artifacts would be carried over if you deploy the model to an endpoint (might confuse the inference runtime model loading code).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM