[英]Should I use state computation? Spark Streaming state computation explanation
Here is my case: I receive data from different devices, which have their own signature, a timestamp and a flag. 这是我的情况:我从不同的设备接收数据,这些设备具有自己的签名,时间戳和标志。 I then filter the
(flag==SAVE_VALUE)
in a file using a foreachRDD
function, but only if it passes this condition: 然后,我使用
foreachRDD
函数过滤文件中的(flag==SAVE_VALUE)
,但foreachRDD
是它通过以下条件:
(it is the first time I receive this signature)
OR
(I already have this signature && the timestamp is older than an hour)
This, until I was in a local environment, meant for me to use a Map, where I stored all the IDs and the last timestamp received. 在我处于本地环境之前,这意味着我可以使用地图,在该地图上我存储了所有ID和收到的最后时间戳。 Right now I would like to move this logic in a Spark like one.
现在,我想像这样在Spark中移动这种逻辑。 How should I do it?
我该怎么办?
I feel this is a case for a stateful Dstream, but I cannot completely understand: 我觉得这是有状态Dstream的一种情况,但我无法完全理解:
Have a look at mapWithState()
, it is exactly what you want. 看看
mapWithState()
,它正是您想要的。
In the StateSpecFunction
, you can determine if you want to update, keep, or remove the current state, whenever a new value arrives for the same key. 在
StateSpecFunction
,可以确定是否要更新,保留或删除当前状态,只要有相同键的新值到达。 You have access to both the current state and the new one, so you can do any type of comparison between the two. 您可以访问当前状态和新状态,因此可以在两者之间进行任何类型的比较。
It has also built-in support for timeouts, and can be partitioned to multiple executors. 它还具有对超时的内置支持,并且可以分区为多个执行程序。
You can access the global map by calling stateSnapshots()
on the return value of mapWithState()
. 你可以通过调用访问全球地图
stateSnapshots()
的返回值mapWithState()
Otherwise the return value will be determined by the return values of your StateSpecFunction
, per batch. 否则,返回值将由每批
StateSpecFunction
的返回值确定。
mapWithState()
was added in Spark 1.6, before that there was a similar function called updateStateByKey()
, which did mostly the same, but performed worse on larger datasets. 在Spark 1.6中添加了
mapWithState()
,然后才有一个名为updateStateByKey()
的相似函数,其功能大致相同,但在较大的数据集上表现较差。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.