简体繁体中英

AWS Glue ETL job from AWS Redshift to S3 fails

原文 2017-08-22 08:50:42 4 3 amazon-web-services/ amazon-s3/ amazon-redshift/ aws-glue

I am trying out AWS Glue service to ETL some data from redshift to S3. Crawler runs successfully and creates the meta table in data catalog, however when I run the ETL job ( generated by AWS ) it fails after around 20 minutes saying "Resource unavailable".

I cannot see AWS glue logs or error logs created in Cloudwatch. When I try to view them it says "Log stream not found. The log stream jr_xxxxxxxxxx could not be found. Check if it was correctly created and retry."

I would appreciate it if you could provide any guidance to resolve this issue.

3 answers

So basically, the job you add to Glue will either run if there's not too much traffic in the region your Glue is. If there are no resources available, you need to either manually re-add the job again or you can also bind yourself to events from CloudWatch via SNS .

Also, there are parameters you can pass to the job like maximunRetry and timeout .

If you have a Ressource not available , it won't trigger a retry because the job did not fail, it just didn't even started. But if you set the timeout to let's say 60 minutes , it will trigger an error after that time, decrement your retry pool and re-launch the job.

The closest thing I see to Glue documentation on this is here:

If you encounter errors in AWS Glue, use the following solutions to help you find the source of the problems and fix them. Note The AWS Glue GitHub repository contains additional troubleshooting guidance in AWS Glue Frequently Asked Questions. Error: Resource Unavailable If AWS Glue returns a resource unavailable message, you can view error messages or logs to help you learn more about the issue. The following tasks describe general methods for troubleshooting. • A custom DNS configuration without reverse lookup can cause AWS Glue to fail. Check your DNS configuration. If you are using Amazon Route 53 or Microsoft Active Directory, make sure that there are forward and reverse lookups. For more information, see Setting Up DNS in Your VPC (p. 23). • For any connections and development endpoints that you use, check that your cluster has not run out of elastic network interfaces.

I have recently struggled with Resource Unavailable thrown by Glue Job

Also i was not able to make a direct connection in Glue using RDS -it said "no suitable security group found"

I faced this issue while trying to connect with AWS RDS and Redshift

The problem was with the Security Group that the Redshift was using. There is a need to place a self referencing inbound rule in the Security Group.

For those who dont know what is self referencing inbound rule, follow the steps

1) Go to the Security Group you are using (VPC -> Security Group)

2) In the Inbound Rules select Edit Inbound Rules

3) Add a Rule

a) Type - All Traffic b) Protocol - All c) Port Range - ALL d) Source - custom and in space available write the initial of your security group and select it. e) Save it.

Its done !

if you were missing this condition in your Security Group Inbound Rules

Try creating the connection you will be able to create the connection.

Also job should work this time.

AWS Glue ETL Job triggered on batches of S3 Events

AWS Glue load new partitions from ETL job fails

AWS Glue ETL to Redshift: DATE

AWS Glue job fails to write to Redshift

When to use Amazon Redshift spectrum over AWS Glue ETL to query on Amazon S3 data

How to Trigger Glue ETL Pyspark job through S3 Events or AWS Lambda?

Loading parquet file from S3 to AWS RDS taking extremely long time using AWS Glue ETL

AWS Glue ETL : transfer data to S3 Bucket

AWS Glue: ETL to read S3 CSV files

How to Read Filename from S3 using AWS Glue ETL Tools

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question AWS Glue ETL Job triggered on batches of S3 Events AWS Glue load new partitions from ETL job fails AWS Glue ETL to Redshift: DATE AWS Glue job fails to write to Redshift When to use Amazon Redshift spectrum over AWS Glue ETL to query on Amazon S3 data How to Trigger Glue ETL Pyspark job through S3 Events or AWS Lambda? Loading parquet file from S3 to AWS RDS taking extremely long time using AWS Glue ETL AWS Glue ETL : transfer data to S3 Bucket AWS Glue: ETL to read S3 CSV files How to Read Filename from S3 using AWS Glue ETL Tools

Related Tags

AWS Glue ETL job from AWS Redshift to S3 fails

Question

3 answers

solution1
5 2018-04-26 18:49:51

solution2
1 2017-09-13 15:57:53

solution3
0 2020-06-02 11:18:36

AWS Glue ETL job from AWS Redshift to S3 fails

Question

3 answers

solution1 5 2018-04-26 18:49:51

solution2 1 2017-09-13 15:57:53

solution3 0 2020-06-02 11:18:36

solution1
5 2018-04-26 18:49:51

solution2
1 2017-09-13 15:57:53

solution3
0 2020-06-02 11:18:36