简体   繁体   English

Spark Structured Streaming 使用 python 和 Kafka 给出错误

[英]Spark Structured Streaming using python and Kafka giving error

I am getting the below error when trying to initiate a readStream for kafka, my Kafka is up and running and I tested it multiple times to ensure it is processing.尝试为 kafka 启动 readStream 时出现以下错误,我的 Kafka 已启动并正在运行,我对其进行了多次测试以确保它正在处理。 Kafka topic is created as well. Kafka 主题也被创建。

''' '''

kafka_df = spark.readStream \
        .format("kafka") \
        .option("kafka.bootstrap.servers", "localhost:9092") \
        .option("subscribe", "mytopic") \
        .option("startingOffsets", "earliest") \
        .load()

''' '''

Traceback (most recent call last): File "C:/Users//PycharmProjects/SparkStreaming/PySparkKafkaStreaming.py", line 18, in kafka_df = spark.readStream Traceback(最近一次通话最后):文件“C:/Users//PycharmProjects/SparkStreaming/PySparkKafkaStreaming.py”,第 18 行,在 kafka_df = spark.readStream
File "C:\Users<username>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyspark\sql\streaming.py", line 420, in load return self._df(self._jreader.load()) File "C:\Users<username>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\py4j\java_gateway.py", line 1304, in call return_value = get_return_value( File "C:\Users<username>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyspark\sql\utils.py", line 134, in deco raise_from(converted) File "", line 3, in raise_from pyspark.sql.utils.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".;文件“C:\Users<username>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyspark\sql\streaming.py”,第 420 行,加载返回 self._df(self._jreader .load()) 文件“C:\Users<username>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\py4j\java_gateway.py”,第 1304 行,调用return_value = get_return_value(File “C:\Users<用户名>\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pyspark\sql\utils.py”,第 134 行,在 deco raise_from(converted) 文件中“”,行3、在raise_from pyspark.sql.utils.AnalysisException: Failed to find data source: kafka. 请按照《Structured Streaming + Kafka Integration Guide》部署章节部署应用。

You need to import the kafka dependencies to run this, For pyspark.您需要导入 kafka 依赖项来运行它,对于 pyspark。 you can download the jar and put in spark/jars directory or import the dependencies in the sparkSession inital config, Please, follow this kafka-structured streaming docs您可以下载 jar 并放入 spark/jars 目录或在 sparkSession 初始配置中导入依赖项,请遵循这个kafka-structured streaming docs

I hope I've helped, anything you could ask me, thanks !希望对你有帮助,有什么可以问我的,谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 结构化流是 python + spark 3.1.1 + kafka 的唯一选择吗? - Is structured streaming the only option for python + spark 3.1.1 + kafka? 使用带有 Python 的 Spark 结构化流的字数 - Word Count using Spark Structured Streaming with Python 使用 SSL 访问 Kafka 的 spark 结构化流引发错误 - spark structured streaming accessing the Kafka with SSL raised error 火花,cassandra,流,python,错误,数据库,kafka - spark, cassandra, streaming, python, error, database, kafka Spark结构化流-python-org.apache.kafka.common.TopicPartition; 类不适用于反序列化 - Spark structured streaming- python - org.apache.kafka.common.TopicPartition; class invalid for deserialization 无法使用火花流连接到 kafka 主题(python、jupyter) - Not able to connect to kafka topic using spark streaming (python, jupyter) 如何使用Spark Streaming和Python使用来自Kafka的JSON记录? - How to consume JSON records from Kafka using Spark Streaming and Python? 使用Python将Apache Kafka与Apache Spark Streaming集成 - Integrating Apache Kafka with Apache Spark Streaming using Python Python 上的 kafka 和 Spark Streaming 的一个坏问题 - A bad issue with kafka and Spark Streaming on Python 结构流Kafka 2.1-&gt; Zeppelin 0.8-&gt; Spark 2.4:spark不使用jar - structured streaming Kafka 2.1->Zeppelin 0.8->Spark 2.4: spark does not use jar
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM