简体   繁体   中英

AWS Accessing Redshift in a VPC

I am trying to ETL data from a Redshift instance (in a VPC) to a S3 bucket using AWS Glue. For this I created a JDBC connection with Redshift.

The crawler successfully fetches schema information from Redshift to data catalog. But when I run the ETL job it fails to fetch data and says "resource unavailable"

  1. Do I need to configure a NAT for Glue to connect to Redshift? (Currently it doesn't have a NAT)
  2. How crawler was able to read schema information from Redshift even without a NAT?

Redshift is inside your VPC. Glue is inside your VPC. S3 isn't. Accessing S3, by default, in most cases, requires access to the Internet.

To access data in S3, you need either a NAT Gatway, a NAT Instance, or an S3 VPC Endpoint to bring a termination point for S3 traffic inside the VPC.

This is still an ongoing issue, for anyone coming across this issue. For my setup it was the availability zone the RDS connection's subnet was in, but as I understand it, this applies to any of the connection types.

The "fix" was to:

  1. AWS Console > Glue > Connections > Edit Connection > See which subnet the connection is using.
  2. AWS Console > VPC > Subnets > Identify (or create) a subnet in a different zone.
  3. AWS Console > Glue > Connections > Edit Connection > Switch to use the subnet from step 2.
  4. Run job.

If the job still fails with Resource Unavailable, repeat until it works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM