AWS: Automating queries in redshift

Question

I want to automate a redshift insert query to be run every day.

We actually use Aws environment. I was told using lambda is not the right approach. Which is the best ETL process to automate a query in Redshift.

Answer 1

For automating SQL on Redshift you have 3 options (at least)

Simple - cron Use a EC2 instance and set up a cron job on that to run your SQL code.

psql -U youruser -p 5439 -h hostname_of_redshift -f your_sql_file

Feature rich - Airflow (Recommended) If you have a complex schedule to run then it is worth investing time learning and using apache airflow. This also needs to run on a server(ec2) but offers a lot of functionality.

https://airflow.apache.org/

AWS serverless - AWS data pipeline (NOT Recommended)

https://aws.amazon.com/datapipeline/

Cloudwatch->Lambda->EC2 method described below by John Rotenstein This is a good method when you want to be AWS centric, it will be cheaper than having a dedicated EC2 instance.

Answer 2

One option:

Use Amazon CloudWatch Events on a schedule to trigger an AWS Lambda function
The Lambda function launches an EC2 instance with a User Data script. Configure Shutdown Behavior as Terminate .
The EC2 instance executes the User Data script
When the script is complete, it should call sudo shutdown now -h to shutdown and terminate the instance

The EC2 instance will only be billed per-second .

Answer 3

Redshift 现在原生支持计划查询： https : //docs.aws.amazon.com/redshift/latest/mgmt/query-editor-schedule-query.html

Answer 4

You can use boto3 and psycopg2 to run the queries by creating a python script and scheduling it in cron to be executed daily.

You can also try to convert your queries into Spark jobs and schedule those jobs to run in AWS Glue daily. If you find it difficult, you can also look into Spark SQL and give it a shot. If you are going with Spark SQL, keep in mind the memory usage as Spark SQL is pretty memory intensive.

AWS: Automating queries in redshift

Question

4 answers

solution1
4 ACCPTED 2018-09-13 07:20:10

solution2
2 2018-09-13 11:58:32

solution3
2 2020-11-06 21:43:55

solution4
0 2019-04-05 09:22:00

AWS: Automating queries in redshift

Question

4 answers

solution1 4 ACCPTED 2018-09-13 07:20:10

solution2 2 2018-09-13 11:58:32

solution3 2 2020-11-06 21:43:55

solution4 0 2019-04-05 09:22:00

solution1
4 ACCPTED 2018-09-13 07:20:10

solution2
2 2018-09-13 11:58:32

solution3
2 2020-11-06 21:43:55

solution4
0 2019-04-05 09:22:00