简体   繁体   English

AWS SageMaker Estimator 无法访问 inte.net

[英]AWS SageMaker Estimator cannot access the internet

I'm trying to run a training job on a SageMaker Tensorflow estimator.我正在尝试在 SageMaker Tensorflow 估算器上运行训练作业。 Before starting the training job I need to install some dependencies.在开始训练工作之前,我需要安装一些依赖项。 As suggested in the Python SDK SageMaker documentation, I put a requirements.txt file in the code root directory.按照 Python SDK SageMaker 文档中的建议,我在代码根目录中放置了一个 requirements.txt 文件。

The training job fails upon trying to install these dependencies with the following error:训练作业在尝试安装这些依赖项时失败并出现以下错误:

sagemaker.exceptions.UnexpectedStatusException: Error for Training job tensorflow-training-2021-09-15-10-34-05-979: Failed. Reason: AlgorithmError: InstallRequirementsError:
Command "/usr/local/bin/python3.7 -m pip install -r requirements.txt"
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f41d0448550>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/efficientnet/

I've specified the su.net and security group in the estimator construct我在估算器构造中指定了 su.net 和安全组

estimator = TensorFlow(
    entry_point="train.py",
    source_dir=job_dir,
    role=role,
    instance_count=1,
    instance_type=instance_type,
    py_version="py37",
    framework_version="2.4",
    subnets=[environ["SUBNET_ID"]],
    security_group_ids=[environ["SECURITY_GROUP_ID"]],
)

The security group allows all outbound ipv4 traffic, the su.net is public and has an inte.net gateway.安全组允许所有出站 ipv4 流量,su.net 是公共的并且有一个 inte.net 网关。

Moreover I've tested this.networking configuration by spawning an ec2 instance in the same su.net-security group, connecting via ssh and successfully installing a pip package.此外,我通过在同一个 su.net-security 组中生成一个 ec2 实例,通过 ssh 连接并成功安装 pip package 来测试 this.networking 配置。

I can't understand why the sagemaker instance can't connect to pypi.org, nor find a way to debug this issue.我无法理解为什么 sagemaker 实例无法连接到 pypi.org,也无法找到调试此问题的方法。

Could be possible that you don't have a NAT in the ENIs launched in the su.net only have a private IP - ie need a NAT to communicate with the inte.net.可能您在 su.net 中启动的 ENI 中没有 NAT 只有一个私有的 IP - 即需要 NAT 才能与 inte.net 通信。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用带有训练脚本的拥抱面估计器和直接在 AWS sagemaker 中使用笔记本有什么区别? - what is the difference between using a hugging face estimator with training script and directly using a notebook in AWS sagemaker? sagemaker.estimator.Estimator 容器 eu-west-2 - sagemaker.estimator.Estimator containers eu-west-2 直接在 AWS Sagemaker Pipeline 中访问参数的值 - Access Parameter's value directly in AWS Sagemaker Pipeline 如何从 AWS Sagemaker 中的适合方法访问集群标签 - How to access cluster labels from a fit method in AWS Sagemaker 从 Sagemaker 调用 AWS 位置 API:拒绝访问异常错误 - Calling AWS Location API from Sagemaker: Access Denied Exception Error estimator.fit 在本地模式下挂在 sagemaker 上 - estimator.fit hangs on sagemaker on local mode AWS SageMaker 上的数据预处理 - Data Preprocessing on AWS SageMaker 在 AWS Sagemaker JumpStart 中调用 `image_uris.retrieve()` 时如何解决“拒绝访问”错误? - How do you resolve an "Access Denied" error when invoking `image_uris.retrieve()` in AWS Sagemaker JumpStart? 无法打开 AWS Sagemaker Studio - Unable to open AWS Sagemaker Studio 更改 AWS SageMaker LogGroup 前缀? - Change AWS SageMaker LogGroup Prefix?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM