简体   繁体   English

有状态和无状态流(火花)

[英]Stateful and Stateless Streaming (Spark)

I know the difference between stateful and stateless streaming processes. 我知道有状态流与无状态流处理之间的区别。 I read that Storm is stateless, while Trident is stateful. 我读到Storm是无状态的,而Trident是有状态的。 I also read that Hadoop (for batch processing) is stateful and that Spark can compute stateful operations. 我还读到Hadoop(用于批处理)是有状态的,Spark可以计算有状态的操作。

Can someone clarify on each of these? 有人可以澄清这些吗? Specifically, 特别,

  1. Can spark do both stateful and stateless operations? Spark可以同时执行有状态和无状态操作吗?
  2. What does it mean that Hadoop is stateful since we talk only about batch processing when it comes to Hadoop. Hadoop是有状态的,这是因为我们仅在涉及Hadoop时谈论批处理。
    1. How does Apache Storm handle stateful streams? Apache Storm如何处理有状态流? (using Trident?) (使用三叉戟?)

1-yes spark have stateful and stateless operations Stateful Stream Processing with mapWithState 1-是,具有状态和无状态操作,具有mapWithState的状态流处理

for more information see https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-apache-spark-streaming.html 有关更多信息,请参见https://databricks.com/blog/2016/02/01/faster-stateful-stream-processing-in-apache-spark-streaming.html

2-Hadoop is stateful because its read once 2-Hadoop是有状态的,因为它只能读取一次

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM