简体   繁体   English

有没有办法将数据从S3存储桶放入Kinesis Firehose?

[英]Is there a way to put data into Kinesis Firehose from S3 bucket?

I am want to write streaming data from S3 bucket into Redshift through Firehose as the data is streaming in real time (600 files every minute) and I dont want any form of data loss. 我想通过Firehose将S3存储桶中的流数据写入Redshift,因为数据是实时流(每分钟600个文件),并且我不希望任何形式的数据丢失。

How to put data from S3 into Kinesis Firehose? 如何将数据从S3放入Kinesis Firehose?

It appears that your situation is: 看来您的情况是:

  • Files randomly appear in S3 from an SFTP server 文件从SFTP服务器随机出现在S3中
  • You would like to load the data into Redshift 您想将数据加载到Redshift

There's two basic ways you could do this: 您可以通过两种基本方法执行此操作:

  • Load the data directly from Amazon S3 into Amazon Redshift, or 将数据直接从Amazon S3加载到Amazon Redshift,或
  • Send the data through Amazon Kinesis Firehose 通过Amazon Kinesis Firehose发送数据

Frankly, there's little benefit in sending it via Kinesis Firehose because Kinesis will simply batch it up, store it into temporary S3 files and then load it into Redshift. 坦白说,通过Kinesis Firehose发送它几乎没有好处,因为Kinesis会简单地对其进行批处理,将其存储到临时S3文件中,然后将其加载到Redshift中。 Therefore, this would not be a beneficial approach. 因此,这不是一个有益的方法。

Instead, I would recommend: 相反,我建议:

  • Configure an event on the Amazon S3 bucket to send a message to an Amazon SQS queue whenever a file is created 在Amazon S3存储桶上配置一个事件,以便在创建文件时将消息发送到Amazon SQS队列
  • Configure Amazon CloudWatch Events to trigger an AWS Lambda function periodically (eg every hour, or 15 minutes, or whatever meets your business need) 配置Amazon CloudWatch Events定期(例如,每小时或15分钟,或满足您业务需求的任何时间)触发AWS Lambda功能
  • The AWS Lambda function reads the messages from SQS and constructs a manifest file , then triggers Redshift to import the files listed in the manifest file AWS Lambda函数从SQS读取消息并构造清单文件 ,然后触发Redshift导入清单文件中列出的文件

This is a simple, loosely-coupled solution that will be much simpler than the Firehose approach (which would require somehow reading each file and sending the contents to Firehose). 这是一个简单的,松耦合的解决方案,比Firehose的方法要简单得多(后者需要以某种方式读取每个文件并将其内容发送到Firehose)。

Its actually designed to do the opposite, Firehose sends incoming streaming data to Amazon S3 not from Amazon S3, and other than S3 it can send data to other services like Redshift and Elasticsearch Service. Firehose实际上是为相反的目的而设计的,Firehose会将传入的流数据发送 Amazon S3,而不是 Amazon S3发送给Amazon S3,除了S3之外,它还可以将数据发送到其他服务,例如Redshift和Elasticsearch Service。

I don't know whether this will solve your problem but you can use COPY from S3 to redshift. 我不知道这是否可以解决您的问题,但是您可以使用S3的COPY进行红移。

Hope it will help! 希望对您有所帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM