简体   繁体   中英

Is there a way to put data into Kinesis Firehose from S3 bucket?

I am want to write streaming data from S3 bucket into Redshift through Firehose as the data is streaming in real time (600 files every minute) and I dont want any form of data loss.

How to put data from S3 into Kinesis Firehose?

It appears that your situation is:

  • Files randomly appear in S3 from an SFTP server
  • You would like to load the data into Redshift

There's two basic ways you could do this:

  • Load the data directly from Amazon S3 into Amazon Redshift, or
  • Send the data through Amazon Kinesis Firehose

Frankly, there's little benefit in sending it via Kinesis Firehose because Kinesis will simply batch it up, store it into temporary S3 files and then load it into Redshift. Therefore, this would not be a beneficial approach.

Instead, I would recommend:

  • Configure an event on the Amazon S3 bucket to send a message to an Amazon SQS queue whenever a file is created
  • Configure Amazon CloudWatch Events to trigger an AWS Lambda function periodically (eg every hour, or 15 minutes, or whatever meets your business need)
  • The AWS Lambda function reads the messages from SQS and constructs a manifest file , then triggers Redshift to import the files listed in the manifest file

This is a simple, loosely-coupled solution that will be much simpler than the Firehose approach (which would require somehow reading each file and sending the contents to Firehose).

Its actually designed to do the opposite, Firehose sends incoming streaming data to Amazon S3 not from Amazon S3, and other than S3 it can send data to other services like Redshift and Elasticsearch Service.

I don't know whether this will solve your problem but you can use COPY from S3 to redshift.

Hope it will help!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM