简体繁体 English

无法将数据从AWS Kinesis存档到Glacier

[英]Cannot Archive Data from AWS Kinesis to Glacier

原文 2018-06-27 17:36:51 6 1 amazon-web-services/ amazon-s3/ aws-sdk/ amazon-kinesis/ aws-sdk-java-2.0

I am working on a Data processing application hosted as a web service on an EC2, each second a small data file (less than 10KB) in .csv format is generated. 我正在EC2上作为Web服务托管的数据处理应用程序上工作，每秒钟都会生成一个.csv格式的小数据文件（小于10KB）。

Problem Statement: Archive all the data files generated to Amazon Glacier. 问题陈述：将生成的所有数据文件存档到Amazon Glacier。

My Approach : As data files are very small. 我的方法：由于数据文件非常小。 I store the files in AWS Kinesis and after few hours i flush data to S3 (because i cannot find a direct way to put data from Kinesis to Glacier) and using S3 lifecycle management at the end of the day i archive all the objects to Glacier. 我将文件存储在AWS Kinesis中，几小时后我将数据刷新到S3（因为我找不到直接将数据从Kinesis放入Glacier的方法），并在一天结束时使用S3生命周期管理将所有对象归档到Glacier 。

My Questions : 我的问题：

Is there a way to transfer data to Glacier directly from Kinesis ? 有没有办法将数据直接从Kinesis传输到Glacier？
Is it possible to configure Kinesis to flush data to S3/Glacier at the end of the day ? 是否可以配置Kinesis在一天结束时将数据刷新到S3 / Glacier？ Is there any time or memory limitation upto which Kinesis can hold data ? Kinesis是否可以保留数据的时间或内存限制？
If Kinesis cannot transfer data to Glacier directly. 如果Kinesis无法将数据直接传输到Glacier。 Is there a workaround for this like - can i write a lambda function which can fetch data from Kinesis and archive it to Glacier ? 有没有类似的解决方法-我可以编写一个lambda函数来从Kinesis获取数据并将其存档到Glacier吗？
Is it possible to merge all the .csv file at Kinesis or S3 or Glacier level ? 是否可以在Kinesis或S3或Glacier级别上合并所有.csv文件？
Is Kinesis suitable for my usecase ? Kinesis是否适合我的用例？ Is there anything else i can use ? 我还有什么可以使用的吗？

I would be grateful if someone can take time and answer my questions and point me to some references. 如果有人能抽出时间回答我的问题并为我提供一些参考，我将不胜感激。 Please let me know if there is a flaw in my approach or if there is a better way to do so. 请让我知道我的方法是否有缺陷，或者是否有更好的方法来解决。

Thanks. 谢谢。

1 个解决方案

You can't directly put data from Kinesis into Glacier (unless you want to put the 10kb filea directly into Glacier) 您不能将来自Kinesis的数据直接放入Glacier（除非您想将10kb文件直接放入Glacier）
You could consider Kinesis Data Firehose as a way of flushing 15min. 您可以考虑将Kinesis Data Firehose冲洗15分钟。 Increments of data to S3 数据到S3的增量
You can definitely do that. 您绝对可以做到。 Glacier allows direct uploads so there's no need to upload to S3 first Glacier允许直接上传，因此无需先上传到S3
You could use Firehose to flush to S3 then transform and aggregate using Athena then transition that file to Glacier. 您可以使用Firehose刷新到S3，然后使用Athena进行转换和聚合，然后将该文件转换为Glacier。 Or you use Lambda directly and upload straight to Glacier. 或者，您直接使用Lambda，然后直接上传到Glacier。
Perhaps streaming data into Firehose would make more sense. 将数据流传输到Firehose也许更有意义。 Depending on your exact needs IoT Analytics might also be interesting. 根据您的确切需求，IoT Analytics也可能很有趣。

Reading your question again, seeing you use csv files, I would highly recommend using the Kinesis > S3 > Athena > Transition to glacier approach 再次阅读您的问题，看到您使用的是csv文件，我强烈建议您使用Kinesis> S3> Athena>向冰川过渡