简体   繁体   中英

Periodically moving query results from Redshift to S3 bucket

I have my data in a table in Redshift cluster. I want to periodically run a query against the Redshift table and store the results in a S3 bucket.

I will be running some data transformations on this data in the S3 bucket to feed into another system. As per AWS documentation I can use the UNLOAD command, but is there a way to schedule this periodically ? I have searched a lot but I haven't found any relevant information around this.

You can use a scheduling tool like Airflow to accomplish this task. Airflow seem-lessly connects to Redshift and S3. You can have a DAG action, which polls Redshift periodically and unloads the data from Redshift onto S3.

I don't believe Redshift has the ability to schedule queries periodically. You would need to use another service for this. You could use a Lambda function, or you could schedule a cron job on an EC2 instance.

I believe you are looking for AWS data pipeline service.

You can copy data from redshift to s3 using the RedshiftCopyActivity ( http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-redshiftcopyactivity.html ).

I am copying the relevant content from the above URL for future purposes:

" You can also copy from Amazon Redshift to Amazon S3 using RedshiftCopyActivity. For more information, see S3DataNode. You can use SqlActivity to perform SQL queries on the data that you've loaded into Amazon Redshift. "

Let me know if this helped.

You should try AWS Data Pipelines. You can schedule them to run periodically or on demand. I am confident that it would solve your use case

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM