简体   繁体   English

与 S3 的 AWS Glue 连接失败

[英]AWS Glue Connection to S3 fails

I want to create a Glue job to transfer data from RDS PostgreSQL into S3.我想创建一个 Glue 作业以将数据从 RDS PostgreSQL 传输到 S3。
To do this, I did the following:为此,我执行了以下操作:

  1. Created a Glue connection to RDS PostgreSQL instance创建了到 RDS PostgreSQL 实例的 Glue 连接
  2. Set up a VPC S3 endpoint and NAT gateway (because test connection to RDS was failing to to VPC S3 endpoint and NAT gateway not being present)设置 VPC S3 端点和 NAT 网关(因为与 RDS 的测试连接无法连接到 VPC S3 端点和 NAT 网关不存在)
  3. Created a Glue "Network" connection to S3创建了到 S3 的胶水“网络”连接

When I try to test the connection, I get the following:当我尝试测试连接时,我得到以下信息:
在此处输入图像描述

Couple of things to note:有几点需要注意:
For the sake of making it work at all, the role used for testing connection has FullAccess policies to Glue and to S3.为了使其完全正常工作,用于测试连接的角色对 Glue 和 S3 具有 FullAccess 策略。
The security group used in the connection is the same one from the RDS PostgreSQL connection (ie allows the TCP connections on port 5432 from my IP, all TCP from itself and all TCP from all IPv4).连接中使用的安全组与 RDS PostgreSQL 连接中的安全组相同(即允许来自我的 IP 的端口 5432 上的 TCP 连接,来自其自身的所有 TCP 以及来自所有 IPv4 的所有 TCP)。
The logs in the screenshot, when the cloudwatch is opened, are throwing an AWS error saying that the logs in question don`t exist.屏幕截图中的日志,当打开 cloudwatch 时,会抛出一个 AWS 错误,指出有问题的日志不存在。
When trying to run a job, without using the S3 connection, the job ends with an error - " An error occurred while calling o96.pyWriteDynamicFrame. connect timed out ".尝试在不使用 S3 连接的情况下运行作业时,作业以错误结束 - “调用 o96.pyWriteDynamicFrame 时发生错误。连接超时”。 This error is thrown from Java code by a method used to send an HTTP request to the bucket specified in the job, because, apparently, spark is unable to reach the bucket via HTTP.此错误是通过用于将 HTTP 请求发送到作业中指定的存储桶的方法从 Java 代码抛出的,因为显然,spark 无法通过 HTTP 到达存储桶。

PS聚苯乙烯
I`m very new to AWS, having only a little experience with Azure before我对 AWS 很陌生,之前对 Azure 只有一点经验

It's possible that the security group configuration is not correct for the Glue S3 connection. Glue S3 连接的安全组配置可能不正确。 The security group used for the S3 connection requires ingress/egress configuration for inbound/outbound traffic.用于 S3 连接的安全组需要为入站/出站流量配置入口/出口。 A simple way to configure this is to allow ingress/egress for all protocols.一个简单的配置方法是允许所有协议的入口/出口。

Example ingress configuration:示例ingress配置:

  • from port: 0来自港口: 0
  • to port: 0到港口: 0
  • protocol: -1 (all protocols)协议: -1 (所有协议)
  • self: true自我: true

Example egress configuration: egress配置示例:

  • from port: 0来自港口: 0
  • to port: 0到港口: 0
  • protocol: -1 (all protocols)协议: -1 (所有协议)
  • CIDR IP: "0.0.0.0/0" CIDR IP: "0.0.0.0/0"

I recommend creating a separate security group for the Glue S3 connection and not reusing the one for RDS.我建议为 Glue S3 连接创建一个单独的安全组,而不是为 RDS 重复使用该安全组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM