简体   繁体   中英

what is a optimal setting for a sagemaker batch job?

Based on AWS documentation, docs , I've set up a batch inference job. however, once we choose the instance type and instance count, bare minimum, does sagemaker choose optimal plan to process jobs, say if there are more than one files, and if resource are available, can those files in parallel?

from sagemaker.transformer import Transformer

tr = Transformer(model_name='custom_model',instance_count=2, instance_type='ml.m4.xlarge')

Batch partitions the Amazon S3 objects in the input by key. Please checkout this

When you have multiple input files to process, you can set the BatchStrategy to MultiLine in order to speed up the processing.

General guideline is - number of workers/instances is a multiple of number of files in S3 to process. If MaxConcurrentTransforms is set to 0 or left unset, Amazon SageMaker checks the optional execution-parameters to determine the settings for your chosen algorithm

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM