[英]Spark Structured Streaming Batch
I am running batch in Structured programming of Spark. 我正在Spark的结构化编程中运行批处理。 The below snippet code throws error saying "kafka is not a valid Spark SQL Data Source;". 以下代码段引发错误,提示“ kafka不是有效的Spark SQL数据源;”。 The version I am using for the same is --> spark-sql-kafka-0-10_2.10. 我使用的相同版本是-> spark-sql-kafka-0-10_2.10。 Your help is appreciated. 感谢您的帮助。 Thanks. 谢谢。
Dataset<Row> df = spark
.read()
.format("kafka")
.option("kafka.bootstrap.servers", "*****")
.option("subscribePattern", "test.*")
.option("startingOffsets", "earliest")
.option("endingOffsets", "latest")
.load();
Exception in thread "main" org.apache.spark.sql.AnalysisException: kafka is not a valid Spark SQL Data Source.;
I had the same problem and like me you are using read instead of readStream. 我有同样的问题,就像我一样,您使用的是read而不是readStream。
Changing spark.read()
to spark.readStream
worked fine for me. 将spark.read()
更改为spark.readStream
对我来说很好。
Use the spark-submit
mechanism and pass along -jars spark-sql-kafka-0-10_2.11-2.1.1.jar
使用spark-submit
机制并传递-jars spark-sql-kafka-0-10_2.11-2.1.1.jar
Adjust the version of kafka, scala and spark in that library according to ur own situation. 根据自己的情况调整该库中的kafka,scala和spark版本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.