简体   繁体   中英

Spark Structured Streaming using python and Kafka giving error

I am getting the below error when trying to initiate a readStream for kafka, my Kafka is up and running and I tested it multiple times to ensure it is processing. Kafka topic is created as well.

'''

kafka_df = spark.readStream \
        .format("kafka") \
        .option("kafka.bootstrap.servers", "localhost:9092") \
        .option("subscribe", "mytopic") \
        .option("startingOffsets", "earliest") \
        .load()

'''

Traceback (most recent call last): File "C:/Users//PycharmProjects/SparkStreaming/PySparkKafkaStreaming.py", line 18, in kafka_df = spark.readStream
File "C:\Users<username>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyspark\sql\streaming.py", line 420, in load return self._df(self._jreader.load()) File "C:\Users<username>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\py4j\java_gateway.py", line 1304, in call return_value = get_return_value( File "C:\Users<username>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyspark\sql\utils.py", line 134, in deco raise_from(converted) File "", line 3, in raise_from pyspark.sql.utils.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".;

You need to import the kafka dependencies to run this, For pyspark. you can download the jar and put in spark/jars directory or import the dependencies in the sparkSession inital config, Please, follow this kafka-structured streaming docs

I hope I've helped, anything you could ask me, thanks !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM