简体   繁体   English

定期运行Redshift查询

[英]Run Redshift Queries Periodically

I have started researching into Redshift. 我已经开始研究Redshift。 It is defined as a "Database" service in AWS. 它在AWS中定义为“数据库”服务。 From what I have learnt so far, we can create tables and ingest data from S3 or from external sources like Hive into Redhshift database (cluster). 根据到目前为止的经验,我们可以创建表并将数据从S3或Hive等外部数据源提取到Redhshift数据库(集群)中。 Also, we can use JDBC connection to query these tables. 另外,我们可以使用JDBC连接来查询这些表。

My questions are - 我的问题是-

  1. Is there a place within Redshift cluster where we can store our queries run it periodically (like Daily)? Redshift集群中是否有一个可以存储我们的查询的地方(例如每日)定期运行它?

  2. Can we store our query in a S3 location and use that to create output to another S3 location? 我们可以将查询存储在S3位置,然后使用该查询将输出创建到另一个S3位置吗?

  3. Can we load a DB2 table unload file with a mixture of binary and string fields to Redshift directly, or do we need a intermediate process to make the data into something like a CSV? 我们可以直接将包含二进制字段和字符串字段的DB2表卸载文件加载到Redshift,还是需要一个中间过程来将数据转换为CSV之类的东西?

I have done some Googling about this. 我对此做了一些谷歌搜索。 If you have link to resources, that will be very helpful. 如果您有资源链接,那将非常有帮助。 Thank you. 谢谢。

I used cursor method using psycopg2 function in python. 我在python中使用了psycopg2函数使用了cursor方法。 The sample code is given below. 示例代码如下。 You have to set all the redshift credentials in env_vars files. 您必须在env_vars文件中设置所有env_vars凭证。 you can set your queries using cursor.execute . 您可以使用cursor.execute设置查询。 here I mension one update query so you can set your query in this place (you can set multiple queries). 在这里,我提到一个更新查询,以便您可以在此位置设置查询(可以设置多个查询)。 After that you have to set this python file into crontab or any other autorun application for running your queries periodically. 之后,您必须将此python文件设置为crontab或任何其他自动运行的应用程序,以定期运行查询。

import psycopg2
import sys
import env_vars

conn_string = "dbname=%s  port=%s  user=%s  password=%s  host=%s " %(env_vars.RedshiftVariables.REDSHIFT_DW ,env_vars.RedshiftVariables.REDSHIFT_PORT ,env_vars.RedshiftVariables.REDSHIFT_USERNAME ,env_vars.RedshiftVariables.REDSHIFT_PASSWORD,env_vars.RedshiftVariables.REDSHIFT_HOST)
conn = psycopg2.connect(conn_string);
cursor = conn.cursor();
cursor.execute("""UPDATE database.demo_table SET  Device_id = '123' where Device = 'IPHONE' or Device = 'Apple'; """);

conn.commit();
conn.close();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM