简体   繁体   English

如何在PySpark中创建kafka流?

[英]How to create a kafka stream in PySpark?

I'm trying to create a kafka stream and then do some transformations on that but It seems the stream that I create is null. 我正在尝试创建一个kafka流,然后对其进行一些转换,但是看来我创建的流为null。 I load a text file into producer then consume it through consumer and It works fine but it doesn't create the kafka stream. 我将文本文件加载到生产者中,然后通过消费者使用它,它可以正常工作,但不会创建kafka流。 The input text file looks like this with 36000 entries: 输入文本文件如下所示,包含36000个条目:

10.000000
26.000000
-8.000000
-28.000000
...

And my python code is: 我的python代码是:

sc = SparkContext(appName="STALTA")
ssc = StreamingContext(sc, 2)
broker, topic = sys.argv[1:]
kvs = KafkaUtils.createStream(ssc, broker, "raw-event-streaming-consumer",{topic:1})
rdd = kvs.flatMap(lambda line: line.strip().split("\n")).map(lambda strelem: float(strelem))
print("****** ", rdd.count())
ssc.start()
ssc.awaitTermination()

rdd.count() should print 36000 but it's empty. rdd.count()应该显示36000,但它为空。

The command that I run my script with is the following: 我用来运行脚本的命令如下:

bin/spark-submit --jars jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar examples/src/main/python/streaming/sparkkafka.py localhost:2181 consumer6

I tried localhost:9092 also but it didn't work. 我也尝试了localhost:9092,但是没有用。

Do you know what I'm doing wrong? 你知道我在做什么吗? Thank you. 谢谢。

I think you should change the broker to the zookeeper IP, usually it's localhost:2181 because it required the zookeeper IP not the kafka IP. 我认为您应该将代理更改为zookeeper IP,通常是localhost:2181因为它需要zookeeper IP而不是kafka IP。

For the question to print the DStream , you could directly print with kvs.pprint() usually it will print like a tupple of rdd. 对于打印DStream的问题,您可以直接使用kvs.pprint()打印,通常它会像rdd的tupple一样打印。 Don't use the print() its not recognized by the sparkstreaming as output command so you will get those error. 不要使用sparkstreaming无法识别为输出命令的print()来获取这些错误。

Error : java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute

Hope this will help you. 希望这会帮助你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM