简体   繁体   English

连接火花结构化流+ kafka时出错

[英]Error when connecting spark structured streaming + kafka

im trying to connect my structured streaming spark 2.4.5 with kafka, but all the times that im trying this Data Source Provider errors appears.我试图将我的结构化流 Spark 2.4.5 与 kafka 连接,但我尝试此数据源提供程序错误的所有时间都会出现。 Follow my scala code and my sbt build:按照我的 scala 代码和我的 sbt 构建:

import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.streaming.Trigger

object streaming_app_demo {
  def main(args: Array[String]): Unit = {

    println("Spark Structured Streaming with Kafka Demo Application Started ...")

    val KAFKA_TOPIC_NAME_CONS = "test"
    val KAFKA_OUTPUT_TOPIC_NAME_CONS = "test"
    val KAFKA_BOOTSTRAP_SERVERS_CONS = "localhost:9092"


    val spark = SparkSession.builder
      .master("local[*]")
      .appName("Spark Structured Streaming with Kafka Demo")
      .getOrCreate()

    spark.sparkContext.setLogLevel("ERROR")

    // Stream from Kafka
    val df = spark.readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", KAFKA_BOOTSTRAP_SERVERS_CONS)
      .option("subscribe", KAFKA_TOPIC_NAME_CONS)
      .option("startingOffsets", "latest")
      .load()

    val ds = df
      .selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
      .writeStream
      .format("kafka")
      .option("kafka.bootstrap.servers", "localhost:9092")
      .option("topic", "test2")
      .start()
  }
}

And the error is:错误是:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".;
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:652)
    at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:161)
    at streaming_app_demo$.main(teste.scala:29)
    at streaming_app_demo.main(teste.scala)

And my sbt.build is:我的 sbt.build 是:

name := "scala_212"

version := "0.1"

scalaVersion := "2.12.11"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.5"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.5"

libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.4.5" % "provided"

libraryDependencies += "org.apache.kafka" % "kafka-clients" % "2.5.0"

Thank You !谢谢你 !

For spark structured streaming + kafka , this spark-sql-kafka-0-10 library required.对于spark structured streaming + kafka ,需要这个 spark-sql-kafka-0-10 库。

You are getting this org.apache.spark.sql.AnalysisException: Failed to find data source: kafka exception because spark-sql-kafka library is not available in your classpath & It is unable to find org.apache.spark.sql.sources.DataSourceRegister inside META-INF/services folder. You are getting this org.apache.spark.sql.AnalysisException: Failed to find data source: kafka exception because spark-sql-kafka library is not available in your classpath & It is unable to find org.apache.spark.sql.sources.DataSourceRegister在 META-INF/services 文件夹中。

DataSourceRegister path inside jar file

/org/apache/spark/spark-sql-kafka-0-10_2.11/2.2.0/spark-sql-kafka-0-10_2.11-2.2.0.jar./META-INF/services/org.apache.spark.sql.sources.DataSourceRegister /org/apache/spark/spark-sql-kafka-0-10_2.11/2.2.0/spark-sql-kafka-0-10_2.11-2.2.0.jar./META-INF/services/org. apache.spark.sql.sources.DataSourceRegister

Update

If you are using SBT, try add below code block.如果您使用的是 SBT,请尝试添加以下代码块。 This will include org.apache.spark.sql.sources.DataSourceRegister file in your final jar.这将包括最终 jar 中的org.apache.spark.sql.sources.DataSourceRegister文件。

// META-INF discarding
assemblyMergeStrategy in assembly := {
  case PathList("META-INF","services",xs @ _*) => MergeStrategy.filterDistinctLines
  case PathList("META-INF",xs @ _*) => MergeStrategy.discard
  case "application.conf" => MergeStrategy.concat
  case _ => MergeStrategy.first
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kafka protobuf 的 Spark 结构化流 - Spark structured streaming of Kafka protobuf Spark结构化流+ Kafka集成:MicroBatchExecution PartitionOffsets错误 - Spark Structured Streaming + Kafka Integration: MicroBatchExecution PartitionOffsets Error Scala:从火花结构化流中读取 Kafka Avro 消息时出错 - Scala: Error reading Kafka Avro messages from spark structured streaming 使用 Spark Structured Streaming 时限制 kafka 批量大小 - Limit kafka batch size when using Spark Structured Streaming Spark Structured Streaming 不会在 Kafka 偏移量处重新启动 - Spark Structured Streaming not restarting at Kafka offsets Spark结构化流Kafka集成偏移管理 - Spark Structured Streaming Kafka Integration Offset management Spark 2.0.2,Kafka源和scalapb实现结构化流 - structured streaming with Spark 2.0.2, Kafka source and scalapb 在Clojure中编写Spark Structured Streaming示例时出错 - Error when writing Spark Structured Streaming example in Clojure 在使用 Kafka 的 Spark Structured streaming 中,Spark 如何管理多个主题的偏移量 - In Spark Structured streaming with Kafka, how spark manages offset for multiple topics 错误:使用 Spark Structured Streaming 读取和写入数据到 kafka 中的另一个主题 - Error: Using Spark Structured Streaming to read and write data to another topic in kafka
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM