[英]AWS SageMaker Estimator cannot access the internet
I'm trying to run a training job on a SageMaker Tensorflow estimator.我正在尝试在 SageMaker Tensorflow 估算器上运行训练作业。 Before starting the training job I need to install some dependencies.
在开始训练工作之前,我需要安装一些依赖项。 As suggested in the Python SDK SageMaker documentation, I put a requirements.txt file in the code root directory.
按照 Python SDK SageMaker 文档中的建议,我在代码根目录中放置了一个 requirements.txt 文件。
The training job fails upon trying to install these dependencies with the following error:训练作业在尝试安装这些依赖项时失败并出现以下错误:
sagemaker.exceptions.UnexpectedStatusException: Error for Training job tensorflow-training-2021-09-15-10-34-05-979: Failed. Reason: AlgorithmError: InstallRequirementsError:
Command "/usr/local/bin/python3.7 -m pip install -r requirements.txt"
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f41d0448550>, 'Connection to pypi.org timed out. (connect timeout=15)')': /simple/efficientnet/
I've specified the su.net and security group in the estimator construct我在估算器构造中指定了 su.net 和安全组
estimator = TensorFlow(
entry_point="train.py",
source_dir=job_dir,
role=role,
instance_count=1,
instance_type=instance_type,
py_version="py37",
framework_version="2.4",
subnets=[environ["SUBNET_ID"]],
security_group_ids=[environ["SECURITY_GROUP_ID"]],
)
The security group allows all outbound ipv4 traffic, the su.net is public and has an inte.net gateway.安全组允许所有出站 ipv4 流量,su.net 是公共的并且有一个 inte.net 网关。
Moreover I've tested this.networking configuration by spawning an ec2 instance in the same su.net-security group, connecting via ssh and successfully installing a pip package.此外,我通过在同一个 su.net-security 组中生成一个 ec2 实例,通过 ssh 连接并成功安装 pip package 来测试 this.networking 配置。
I can't understand why the sagemaker instance can't connect to pypi.org, nor find a way to debug this issue.我无法理解为什么 sagemaker 实例无法连接到 pypi.org,也无法找到调试此问题的方法。
Could be possible that you don't have a NAT in the ENIs launched in the su.net only have a private IP - ie need a NAT to communicate with the inte.net.可能您在 su.net 中启动的 ENI 中没有 NAT 只有一个私有的 IP - 即需要 NAT 才能与 inte.net 通信。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.