[英]How reliable is the delivery of tumbling window data from AWS Kinesis Analytics?
When I analyze the contents of a Kinesis stream using a Kinesis Analytics SQL query grouping by time blocks, how certain can I be that all items in the stream are contained in the aggregates? 当我使用Kinesis Analytics SQL查询按时间段对Kinesis流的内容进行分析时,可以确定流中的所有项目都包含在聚合中吗? Suppose I update the query during runtime, will the analytics application output aggregates v1 up to a point and then aggregates v2 for all items that were not yet reported on by v1? 假设我在运行时更新了查询,分析应用程序的输出是否将v1汇总到一个点,然后针对v1尚未报告的所有项目汇总v2? I something fails under the hood in the implementation, will a new node start reporting exactly from the point where the previous node ended? 如果实现中发生某些故障,新节点是否将从上一个节点结束的那一点开始开始报告? Or should you not rely on the completeness of these aggregations? 还是不应该依赖这些聚合的完整性?
Answer posted on the AWS Forums, where I had cross posted: 我交叉张贴在AWS论坛上的答案:
Please see delivery semantics the service guarantees at https://docs.aws.amazon.com/kinesisanalytics/latest/dev/failover-checkpoint.html 请在https://docs.aws.amazon.com/kinesisanalytics/latest/dev/failover-checkpoint.html上查看服务保证的交付语义。
Analytics service maintains checkpoints and if an update happens or any kind of failure happens, the application resumes from these checkpoints. Google Analytics(分析)服务会维护检查点,如果发生更新或发生任何类型的故障,则应用程序将从这些检查点恢复。 Due to the design, it is possible the service reprocesses some of the same data and produces duplicates. 由于设计的原因,服务可能会重新处理某些相同的数据并产生重复项。 Downstream applications should be able to handle that. 下游应用程序应该能够处理该问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.