[英]Spark streaming print on received stream
What I am trying ot achieve is basically print "hello world"
each time I receive a stream of data. 我正在尝试实现的基本上是每次收到数据流时打印
"hello world"
。
I know that on each stream I can call the function foreachRDD
but that does not help me because: 我知道在每个流上我可以调用
foreachRDD
函数,但这对我没有帮助,因为:
Basicaly, each time the program tries to fetch data (and it does so every 30 seconds lets say because of the spark streaming context) I would like to print hello. 基本上,每次程序尝试获取数据时(并且每隔30秒就会这样说因为火花流上下文)我想打印你好。
Is there a way of doing this? 有办法做到这一点吗? is there like a onlisten event for spark streaming?
有没有像火花流的onlisten事件?
Each batch interval (in your case, 30 seconds) the DStream
will contain one and only one RDD
, which internally is divided by several partitions. 每个批处理间隔(在您的情况下,30秒)
DStream
将包含一个且仅包含一个RDD
,其内部由多个分区划分。 You can check if it's not empty and only then print hello world: 您可以检查它是否为空,然后打印hello world:
// Create DStream from source
dstream.foreachRDD { rdd => if (!rdd.isEmpty) println("hello world") }
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.