简体   繁体   English

如何将 AWS Glue 连接到 VPC 并访问私有资源?

[英]How to connect AWS Glue to a VPC, and access private resources?

I am trying to connect to services and databases running inside a VPC (private subnets) from an AWS Glue job.我正在尝试从 AWS Glue 作业连接到在 VPC(私有子网)内运行的服务和数据库。 The private resources should not be exposed publicly (eg, moving to a public subnet or setting up public load balancers).私有资源不应公开(例如,移动到公共子网或设置公共负载平衡器)。

Unfortunately, AWS Glue doesn't seem to support running inside user defined VPCs.不幸的是,AWS Glue 似乎不支持在用户定义的 VPC 中运行。 AWS does provide something called Glue Database Connections which, when used with the Glue SDK, magically set up elastic network interfaces inside the specified VPC for Glue/Spark worker nodes. AWS 确实提供了称为Glue 数据库连接的东西,当与 Glue SDK 一起使用时,它会神奇地在指定的 VPC 内为 Glue/Spark 工作节点设置弹性网络接口。 The network interfaces then tunnel traffic from Glue to a specific database inside the VPC.然后,网络接口将流量从 Glue 隧道传输到 VPC 内的特定数据库。 However, this requires the location and credentials of specific databases, and it is not clear if and when other traffic (eg, a REST call to a service) is tunnelled through the VPC.但是,这需要特定数据库的位置和凭据,并且不清楚其他流量(例如,对服务的 REST 调用)是否以及何时通过 VPC 进行隧道传输。

Is there a reliable way to setup a Glue -> VPC connection that will tunnel all traffic through a VPC?是否有可靠的方法来设置 Glue -> VPC 连接,该连接将通过 VPC 隧道传输所有流量?

You can create a database connection with NETWORK connection type and use that connection in your Glue job.您可以使用NETWORK连接类型创建数据库连接,并在 Glue 作业中使用该连接。 It will allow your job to call a REST API or any other resource within your VPC.它将允许您的工作调用 REST API 或您的 VPC 中的任何其他资源。

在此处输入图像描述

https://docs.aws.amazon.com/glue/latest/dg/connection-using.html https://docs.aws.amazon.com/glue/latest/dg/connection-using.html

Network (designates a connection to a data source within an Amazon Virtual Private Cloud environment (Amazon VPC))网络(指定与 Amazon Virtual Private Cloud 环境 (Amazon VPC) 中的数据源的连接)

在此处输入图像描述

https://docs.aws.amazon.com/glue/latest/dg/connection-JDBC-VPC.html https://docs.aws.amazon.com/glue/latest/dg/connection-JDBC-VPC.html

To allow AWS Glue to communicate with its components, specify a security group with a self-referencing inbound rule for all TCP ports.要允许 AWS Glue 与其组件通信,请为所有 TCP 端口指定一个具有自引用入站规则的安全组。 By creating a self-referencing rule, you can restrict the source to the same security group in the VPC and not open it to all networks.通过创建自引用规则,您可以将源限制为 VPC 中的同一个安全组,而不是对所有网络开放。

在此处输入图像描述

However, this requires the location and credentials of specific databases, and it is not clear if and when other traffic (eg, a REST call to a service) is tunnelled through the VPC.但是,这需要特定数据库的位置和凭据,并且不清楚其他流量(例如,对服务的 REST 调用)是否以及何时通过 VPC 进行隧道传输。

I agree the documentation is confusing, but according to this paragraph on the page you linked, it appears that all traffic is indeed tunneled through the VPC, since you have to have a NAT Gateway or VPC endpoints to allow Glue to access things outside the VPC once you have configured it with VPC access:我同意文档令人困惑,但是根据您链接的页面上的这一段,似乎所有流量确实都通过 VPC 进行了隧道传输,因为您必须拥有 NAT 网关或 VPC 端点才能允许 Glue 访问 VPC 之外的内容将其配置为使用 VPC 访问权限后:

All JDBC data stores that are accessed by the job must be available from the VPC subnet.作业访问的所有 JDBC 数据存储都必须在 VPC 子网中可用。 To access Amazon S3 from within your VPC, a VPC endpoint is required.要从您的 VPC 中访问 Amazon S3,需要一个 VPC 终端节点。 If your job needs to access both VPC resources and the public internet, the VPC needs to have a Network Address Translation (NAT) gateway inside the VPC.如果您的工作需要访问 VPC 资源和公共 Internet,则 VPC 需要在 VPC 内具有网络地址转换 (NAT) 网关。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM