[英]AWS: Automating queries in redshift
I want to automate a redshift insert query to be run every day.我想自动化每天运行的 redshift 插入查询。
We actually use Aws environment.我们实际上使用的是 Aws 环境。 I was told using lambda is not the right approach.有人告诉我使用 lambda 不是正确的方法。 Which is the best ETL process to automate a query in Redshift.哪个是在 Redshift 中自动执行查询的最佳 ETL 过程。
For automating SQL on Redshift you have 3 options (at least)要在 Redshift 上自动化 SQL,您有 3 个选项(至少)
Simple - cron Use a EC2 instance and set up a cron job on that to run your SQL code.简单 - cron使用 EC2 实例并在其上设置一个 cron 作业来运行您的 SQL 代码。
psql -U youruser -p 5439 -h hostname_of_redshift -f your_sql_file
Feature rich - Airflow (Recommended) If you have a complex schedule to run then it is worth investing time learning and using apache airflow.功能丰富 - Airflow(推荐)如果您有一个复杂的计划要运行,那么值得花时间学习和使用 apache 气流。 This also needs to run on a server(ec2) but offers a lot of functionality.这也需要在服务器(ec2)上运行,但提供了很多功能。
https://airflow.apache.org/ https://airflow.apache.org/
AWS serverless - AWS data pipeline (NOT Recommended) AWS 无服务器 - AWS 数据管道(不推荐)
https://aws.amazon.com/datapipeline/ https://aws.amazon.com/datapipeline/
Cloudwatch->Lambda->EC2 method described below by John Rotenstein This is a good method when you want to be AWS centric, it will be cheaper than having a dedicated EC2 instance. Cloudwatch->Lambda->EC2 方法由 John Rotenstein 描述如下,当您希望以 AWS 为中心时,这是一个很好的方法,它比拥有一个专用的 EC2 实例便宜。
One option:一种选择:
Terminate
.将关闭行为配置为Terminate
。sudo shutdown now -h
to shutdown and terminate the instance脚本完成后,它应该调用sudo shutdown now -h
来关闭和终止实例The EC2 instance will only be billed per-second . EC2 实例将仅按每秒计费。
Redshift 现在原生支持计划查询: https : //docs.aws.amazon.com/redshift/latest/mgmt/query-editor-schedule-query.html
You can use boto3 and psycopg2 to run the queries by creating a python script and scheduling it in cron to be executed daily.您可以使用 boto3 和 psycopg2 通过创建 python 脚本并将其安排在 cron 中每天执行来运行查询。
You can also try to convert your queries into Spark jobs and schedule those jobs to run in AWS Glue daily.您还可以尝试将您的查询转换为 Spark 作业,并安排这些作业每天在 AWS Glue 中运行。 If you find it difficult, you can also look into Spark SQL and give it a shot.如果你觉得很难,你也可以查看 Spark SQL 并试一试。 If you are going with Spark SQL, keep in mind the memory usage as Spark SQL is pretty memory intensive.如果您要使用 Spark SQL,请记住内存使用量,因为 Spark SQL 是非常占用内存的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.