在 Sagemaker 中延迟执行 Sagemaker.sklearn.processing.SKLearnProcessor.run 作业

Question

I use Sagemaker's SKLearnProcessor.run for executing my training job.我使用 Sagemaker 的 SKLearnProcessor.run 来执行我的训练工作。 Between the time my processing job starts executing and the time my first line of the code in the processing.py file is read, there is a delay of 4-5 minutes.在我的处理作业开始执行和读取 processing.py 文件中的第一行代码之间，有 4-5 分钟的延迟。 After the job starts executing, irrespective of how large the input file is, the job completes execution quickly, as is expected from Sagemaker's processing capabilities.作业开始执行后，无论输入文件有多大，作业都会快速完成执行，这与 Sagemaker 的处理能力相符。

My question is, can I somehow reduce the time it takes to start executing my processing.py file.我的问题是，我能否以某种方式减少开始执行 processing.py 文件所需的时间。

sklearn_job.run(code= os.path.join('s3://',bucket, code_prefix, 'preprocessing_v2.py'), sklearn_job.run(code= os.path.join('s3://',bucket, code_prefix, 'preprocessing_v2.py'),

''' '''

            inputs=[ProcessingInput(
                input_name='raw1',
                source= os.path.join('s3://',bucket, input_prefix, 'file1.csv'),
                destination='/opt/ml/processing/input1'),
                   ProcessingInput(
                input_name='raw2',
                source= os.path.join('s3://',bucket, input_prefix, 'file2.csv'),
                destination='/opt/ml/processing/input2')],
            outputs=[ProcessingOutput(output_name='sample_file',
                                      source='/opt/ml/processing/dataset',
                                      destination=os.path.join('s3://',bucket, output_prefix))],
                  
            arguments=["--train_size", "0.8","--test_size","0.2"],
            wait=True, logs=True,
           )

''' '''

Answer 1

thanks for posting: You can reduce the job duration by:感谢发帖：您可以通过以下方式减少工作时间：

Using light docker images (if you use a managed image that's not an option)使用 light docker 图像（如果您使用不可选择的托管图像）
Using small datasets to minimize download times使用小数据集来最小化下载时间

beyond that indeed you will still face a "cold start" of few minutes (with Sklearn Estimator on CPU instances I found it's rarely more than 1-2), for SageMaker to launch and configure the compute cluster.事实上，除此之外，您仍将面临几分钟的“冷启动”（在 CPU 实例上使用 Sklearn Estimator，我发现它很少超过 1-2 分钟），以便 SageMaker 启动和配置计算集群。

This "cold start" is a symptom of a good feature of the service, which is the transient nature of compute clusters: every job execution runs on a new EC2 cluster (1 or N machines based on your config).这种“冷启动”是该服务的一个良好特性的征兆，它是计算集群的瞬态特性：每个作业执行都在新的 EC2 集群（1 或 N 台机器，具体取决于您的配置）上运行。 This is good for security, scalability and fault tolerance.这有利于安全性、可伸缩性和容错性。

在 Sagemaker 中延迟执行 Sagemaker.sklearn.processing.SKLearnProcessor.run 作业

问题描述

1 个解决方案

解决方案1
1 2022-02-25 08:34:17

在 Sagemaker 中延迟执行 Sagemaker.sklearn.processing.SKLearnProcessor.run 作业

问题描述

1 个解决方案

解决方案1 1 2022-02-25 08:34:17

解决方案1
1 2022-02-25 08:34:17