简体   繁体   English

sagemaker 批处理作业的最佳设置是什么?

[英]what is a optimal setting for a sagemaker batch job?

Based on AWS documentation, docs , I've set up a batch inference job.根据 AWS 文档docs ,我设置了一个批量推理作业。 however, once we choose the instance type and instance count, bare minimum, does sagemaker choose optimal plan to process jobs, say if there are more than one files, and if resource are available, can those files in parallel?但是,一旦我们选择了实例类型和实例数量(最低限度),sagemaker 是否会选择最佳计划来处理作业,比如如果有多个文件,如果资源可用,这些文件是否可以并行?

from sagemaker.transformer import Transformer

tr = Transformer(model_name='custom_model',instance_count=2, instance_type='ml.m4.xlarge')

Batch partitions the Amazon S3 objects in the input by key. Batch 按键对输入中的 Amazon S3 对象进行分区。 Please checkout this请检查这个

When you have multiple input files to process, you can set the BatchStrategy to MultiLine in order to speed up the processing.当您有多个输入文件需要处理时,您可以将BatchStrategy设置为MultiLine以加快处理速度。

General guideline is - number of workers/instances is a multiple of number of files in S3 to process.一般准则是 - 工作人员/实例的数量是 S3 中要处理的文件数量的倍数。 If MaxConcurrentTransforms is set to 0 or left unset, Amazon SageMaker checks the optional execution-parameters to determine the settings for your chosen algorithm如果 MaxConcurrentTransforms 设置为 0 或未设置,Amazon SageMaker 会检查可选的执行参数以确定您选择的算法的设置

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM