简体   繁体   English

收到的流上的Spark流式打印

[英]Spark streaming print on received stream

What I am trying ot achieve is basically print "hello world" each time I receive a stream of data. 我正在尝试实现的基本上是每次收到数据流时打印"hello world"

I know that on each stream I can call the function foreachRDD but that does not help me because: 我知道在每个流上我可以调用foreachRDD函数,但这对我没有帮助,因为:

  1. It might be that there is no data processed 可能是没有处理数据
  2. I don't want to print hello on each rdd, I want to print hello on the entire stream (whether I received data or not). 我不想在每个rdd上打印你好,我想在整个流上打印你好(无论我是否收到数据)。

Basicaly, each time the program tries to fetch data (and it does so every 30 seconds lets say because of the spark streaming context) I would like to print hello. 基本上,每次程序尝试获取数据时(并且每隔30秒就会这样说因为火花流上下文)我想打印你好。

Is there a way of doing this? 有办法做到这一点吗? is there like a onlisten event for spark streaming? 有没有像火花流的onlisten事件?

Each batch interval (in your case, 30 seconds) the DStream will contain one and only one RDD , which internally is divided by several partitions. 每个批处理间隔(在您的情况下,30秒) DStream将包含一个且仅包含一个RDD ,其内部由多个分区划分。 You can check if it's not empty and only then print hello world: 您可以检查它是否为空,然后打印hello world:

// Create DStream from source
dstream.foreachRDD { rdd => if (!rdd.isEmpty) println("hello world") }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM