简体   繁体   English

AWS:在红移中自动查询

[英]AWS: Automating queries in redshift

I want to automate a redshift insert query to be run every day.我想自动化每天运行的 redshift 插入查询。

We actually use Aws environment.我们实际上使用的是 Aws 环境。 I was told using lambda is not the right approach.有人告诉我使用 lambda 不是正确的方法。 Which is the best ETL process to automate a query in Redshift.哪个是在 Redshift 中自动执行查询的最佳 ETL 过程。

For automating SQL on Redshift you have 3 options (at least)要在 Redshift 上自动化 SQL,您有 3 个选项(至少)

Simple - cron Use a EC2 instance and set up a cron job on that to run your SQL code.简单 - cron使用 EC2 实例并在其上设置一个 cron 作业来运行您的 SQL 代码。

psql -U youruser -p 5439 -h hostname_of_redshift -f your_sql_file

Feature rich - Airflow (Recommended) If you have a complex schedule to run then it is worth investing time learning and using apache airflow.功能丰富 - Airflow(推荐)如果您有一个复杂的计划要运行,那么值得花时间学习和使用 apache 气流。 This also needs to run on a server(ec2) but offers a lot of functionality.这也需要在服务器(ec2)上运行,但提供了很多功能。

https://airflow.apache.org/ https://airflow.apache.org/

AWS serverless - AWS data pipeline (NOT Recommended) AWS 无服务器 - AWS 数据管道(不推荐)

https://aws.amazon.com/datapipeline/ https://aws.amazon.com/datapipeline/

Cloudwatch->Lambda->EC2 method described below by John Rotenstein This is a good method when you want to be AWS centric, it will be cheaper than having a dedicated EC2 instance. Cloudwatch->Lambda->EC2 方法由 John Rotenstein 描述如下,当您希望以 AWS 为中心时,这是一个很好的方法,它比拥有一个专用的 EC2 实例便宜。

One option:一种选择:

  • Use Amazon CloudWatch Events on a schedule to trigger an AWS Lambda function按计划使用Amazon CloudWatch Events触发 AWS Lambda 函数
  • The Lambda function launches an EC2 instance with a User Data script. Lambda 函数使用用户数据脚本启动 EC2 实例。 Configure Shutdown Behavior as Terminate .关闭行为配置为Terminate
  • The EC2 instance executes the User Data script EC2 实例执行用户数据脚本
  • When the script is complete, it should call sudo shutdown now -h to shutdown and terminate the instance脚本完成后,它应该调用sudo shutdown now -h关闭和终止实例

The EC2 instance will only be billed per-second . EC2 实例将仅按每秒计费。

You can use boto3 and psycopg2 to run the queries by creating a python script and scheduling it in cron to be executed daily.您可以使用 boto3 和 psycopg2 通过创建 python 脚本并将其安排在 cron 中每天执行来运行查询。

You can also try to convert your queries into Spark jobs and schedule those jobs to run in AWS Glue daily.您还可以尝试将您的查询转换为 Spark 作业,并安排这些作业每天在 AWS Glue 中运行。 If you find it difficult, you can also look into Spark SQL and give it a shot.如果你觉得很难,你也可以查看 Spark SQL 并试一试。 If you are going with Spark SQL, keep in mind the memory usage as Spark SQL is pretty memory intensive.如果您要使用 Spark SQL,请记住内存使用量,因为 Spark SQL 是非常占用内存的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM