简体   繁体   English

Spark结构化流式批处理

[英]Spark Structured Streaming Batch

I am running batch in Structured programming of Spark. 我正在Spark的结构化编程中运行批处理。 The below snippet code throws error saying "kafka is not a valid Spark SQL Data Source;". 以下代码段引发错误,提示“ kafka不是有效的Spark SQL数据源;”。 The version I am using for the same is --> spark-sql-kafka-0-10_2.10. 我使用的相同版本是-> spark-sql-kafka-0-10_2.10。 Your help is appreciated. 感谢您的帮助。 Thanks. 谢谢。

Dataset<Row> df = spark
    .read()         
    .format("kafka")
    .option("kafka.bootstrap.servers", "*****")
    .option("subscribePattern", "test.*")
    .option("startingOffsets", "earliest")
    .option("endingOffsets", "latest")
    .load();
Exception in thread "main" org.apache.spark.sql.AnalysisException: kafka is not a valid Spark SQL Data Source.;

I had the same problem and like me you are using read instead of readStream. 我有同样的问题,就像我一样,您使用的是read而不是readStream。

Changing spark.read() to spark.readStream worked fine for me. spark.read()更改为spark.readStream对我来说很好。

Use the spark-submit mechanism and pass along -jars spark-sql-kafka-0-10_2.11-2.1.1.jar 使用spark-submit机制并传递-jars spark-sql-kafka-0-10_2.11-2.1.1.jar

Adjust the version of kafka, scala and spark in that library according to ur own situation. 根据自己的情况调整该库中的kafka,scala和spark版本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM