简体   繁体   English

Spark Streaming Scala窗口长度(按对象数)

[英]Spark streaming scala window length by number of objects

I am using spark and scala and I would like to create a window operation with length set in number of objects ie the window starts empty, as the stream initiates the objects are stored in the window up until it holds 10 objects and when the 11th comes the first is dropped. 我正在使用spark和scala,我想创建一个长度设置为对象数的窗口操作,即窗口开始为空,因为流启动将对象存储在窗口中,直到它容纳10个对象,并且当第11个出现时第一个被丢弃。

Is this possible or do I have to use an other structure like a list or array? 这可能吗,还是我必须使用其他结构(例如列表或数组)? The documentation ( http://spark.apache.org/docs/latest/streaming-programming-guide.html#window-operations ) and some googling only refer to a time based window (length and interval). 文档( http://spark.apache.org/docs/latest/streaming-programming-guide.html#window-operations )和某些谷歌搜索仅指基于时间的窗口(长度和间隔)。

Thank you in advance. 先感谢您。

Window in Spark streaming is characterized by windowDuration and slideDuration (optional). Spark流中的窗口的特征在于windowDurationslideDuration (可选)。 So, it is a time window. 因此,这是一个时间窗口。 But you can consider using Apache Flink . 但是您可以考虑使用Apache Flink It supports both count windows and time windows . 它支持计数窗口和时间窗口 But in comparison to Spark, Flink has another streaming ideology. 但是与Spark相比,Flink具有另一种流媒体意识形态。 It process incoming events as they arrive (Spark processes events in micro-batches). 它在到达事件时对其进行处理(Spark以微批处理方式处理事件)。 As a result, Flink may have some restrictions. 因此,Flink可能会有一些限制。 Give it a try if it suits your needs. 如果适合您,请尝试一下。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM