简体   繁体   English

Kafka Spark 流式传输错误 - java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/read/streaming/ReportsSourceMetrics

[英]Kafka Spark Streaming Error - java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/read/streaming/ReportsSourceMetrics

I'm using Spark 3.1.2, Kafka 2.8.1 & Scala 2.12.1我正在使用Spark 3.1.2、Kafka 2.8.1 和 Scala 2.12.1

Getting below Error while integrating Kafka and Spark streaming -在集成 Kafka 和 Spark 流时遇到错误 -

java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/read/streaming/ReportsSourceMetrics java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/read/streaming/ReportsSourceMetrics

Spark-shell command with Dependency - spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2具有依赖关系的 Spark-shell 命令 - spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2

 org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
 :: resolving dependencies :: org.apache.spark#spark-submit-parent-3643b83d-a2f8-43d1-941f-a125272f3905;1.0
         confs: [default]
         found org.apache.spark#spark-sql-kafka-0-10_2.12;3.1.2 in central
         found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.1.2 in central
         found org.apache.kafka#kafka-clients;2.6.0 in central
         found com.github.luben#zstd-jni;1.4.8-1 in central
         found org.lz4#lz4-java;1.7.1 in central
         found org.xerial.snappy#snappy-java;1.1.8.2 in central
         found org.slf4j#slf4j-api;1.7.30 in central
         found org.spark-project.spark#unused;1.0.0 in central
         found org.apache.commons#commons-pool2;2.6.2 in central
 :: resolution report :: resolve 564ms :: artifacts dl 9ms
         :: modules in use:
         com.github.luben#zstd-jni;1.4.8-1 from central in [default]
         org.apache.commons#commons-pool2;2.6.2 from central in [default]
         org.apache.kafka#kafka-clients;2.6.0 from central in [default]
         org.apache.spark#spark-sql-kafka-0-10_2.12;3.1.2 from central in [default]
         org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.1.2 from central in [default]
         org.lz4#lz4-java;1.7.1 from central in [default]
         org.slf4j#slf4j-api;1.7.30 from central in [default]
         org.spark-project.spark#unused;1.0.0 from central in [default]
         org.xerial.snappy#snappy-java;1.1.8.2 from central in [default]
         ---------------------------------------------------------------------
         |                  |            modules            ||   artifacts   |
         |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
         ---------------------------------------------------------------------
         |      default     |   9   |   0   |   0   |   0   ||   9   |   0   |
         ---------------------------------------------------------------------
 :: retrieving :: org.apache.spark#spark-submit-parent-3643b83d-a2f8-43d1-941f-a125272f3905
         confs: [default]
         0 artifacts copied, 9 already retrieved (0kB/15ms)
 21/12/28 17:46:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
 Setting default log level to "WARN".
 To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
 21/12/28 17:46:28 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
 Spark context Web UI available at http://*******:4041
 Spark context available as 'sc' (master = local[*], app id = local-1640693788919).
 Spark session available as 'spark'.
 Welcome to
       ____              __
      / __/__  ___ _____/ /__
     _\ \/ _ \/ _ `/ __/  '_/
    /___/ .__/\_,_/_/ /_/\_\   version 3.1.2
       /_/
 
 Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
 Type in expressions to have them evaluated.
 Type :help for more information.
    
    val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "127.0.1.1:9092").option("subscribe", "Topic").option("startingOffsets", "earliest").load()
    
    df.printSchema()
    
    import org.apache.spark.sql.types._
    val schema = new StructType().add("id",IntegerType).add("fname",StringType).add("lname",StringType)
    val personStringDF = df.selectExpr("CAST(value AS STRING)")
    val personDF = personStringDF.select(from_json(col("value"), schema).as("data")).select("data.*")
     
    personDF.writeStream.format("console").outputMode("append").start().awaitTermination()
    
    Exception in thread "stream execution thread for [id = 44e8f8bf-7d94-4313-9d2b-88df8f5bc10f, runId = 3b4c63c4-9062-4288-a681-7dd6cfb836d0]" java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/read/streaming/ReportsSourceMetrics

I had nearly the same problem - same exception but in spark-submit .我遇到了几乎相同的问题——同样的异常,但在spark-submit中。 I solved it by upgrading Spark to version 3.2.0 .我通过将 Spark 升级到3.2.0版本解决了这个问题。 I also used version 3.2.0 of org.apache.spark:spark-sql-kafka-0-10_2.12 with the full command being:我还使用了org.apache.spark:spark-sql-kafka-0-10_2.123.2.0版本,完整命令为:

$SPARK_HOME/bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.0 script.py

Spark_version 3.1.2 Spark_version 3.1.2

Scala_version 2.12.10 Scala_version 2.12.10

Kafka_version 2.8.1 Kafka_version 2.8.1

Note: The versions are very important when we use --packages org.apache.spark:spark-sql-kafka-0-10_2.12:VVV with either spark-shell or spark-submit .注意:当我们将--packages org.apache.spark:spark-sql-kafka-0-10_2.12:VVVspark-shellspark-submit一起使用时,版本非常重要。 Where (VVV = Spark_version)其中(VVV = Spark_version)

I followed the following steps as given at spark-kafka-example :我按照spark-kafka-example给出的以下步骤进行操作:

  1. start producer $ kafka-console-producer.sh --broker-list Kafka-Server-IP:9092 --topic kafka-spark-test启动生产者$ kafka-console-producer.sh --broker-list Kafka-Server-IP:9092 --topic kafka-spark-test

You should see the prompt > on console.您应该在控制台上看到提示> Enter some test data on producer.在生产者上输入一些测试数据。

>{"name":"foo","dob_year":1995,"gender":"M","salary":2000}
>{"name":"bar","dob_year":1996,"gender":"M","salary":2500}
>{"name":"baz","dob_year":1997,"gender":"F","salary":3500}
>{"name":"foo-bar","dob_year":1998,"gender":"M","salary":4000}

  1. Start the spark-shell as follows:启动spark-shell如下:
spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2

Notice: i have used 3.1.2.注意:我使用的是 3.1.2。 You will see something like following on successful start:成功启动后,您将看到类似以下内容:

Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.2
      /_/
         
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.13)
Type in expressions to have them evaluated.
Type :help for more information.

  1. Enter the imports and create DataFrame, and print the schema.输入导入并创建 DataFrame,并打印架构。
val df = spark.readStream.
      format("kafka"). 
      option("kafka.bootstrap.servers", "Kafka-Server-IP:9092").
      option("subscribe", "kafka-spark-test").
      option("startingOffsets", "earliest").
      load()

df.printSchema()
  1. The successful execution should result following:成功执行应导致以下结果:
scala> df.printSchema()
root
 |-- key: binary (nullable = true)
 |-- value: binary (nullable = true)
 |-- topic: string (nullable = true)
 |-- partition: integer (nullable = true)
 |-- offset: long (nullable = true)
 |-- timestamp: timestamp (nullable = true)
 |-- timestampType: integer (nullable = true)


  1. Convert binary values of DataFrame to string.将 DataFrame 的二进制值转换为字符串。 I am showing command with output我正在使用 output 显示命令
scala>      val personStringDF = df.selectExpr("CAST(value AS STRING)")


personStringDF: org.apache.spark.sql.DataFrame = [value: string]

  1. Make and schema for DataFrame. DataFrame 的品牌和架构。 I am showing command with output我正在使用 output 显示命令
scala> val schema = new StructType().
     |       add("name",StringType).
     |       add("dob_year",IntegerType).
     |       add("gender",StringType).
     |       add("salary",IntegerType)


schema: org.apache.spark.sql.types.StructType = StructType(StructField(name,StringType,true), StructField(dob_year,IntegerType,true), StructField(gender,StringType,true), StructField(salary,IntegerType,true))

  1. Select the data Select 数据
scala>  val personDF = personStringDF.select(from_json(col("value"), schema).as("data")).select("data.*")

personDF: org.apache.spark.sql.DataFrame = [name: string, dob_year: int ... 2 more fields]

  1. Write the stream on console在控制台上写 stream

scala>  personDF.writeStream.
     |       format("console").
     |       outputMode("append").
     |       start().
     |       awaitTermination()


You will see the following output:您将看到以下 output:

-------------------------------------------                                     
Batch: 0
-------------------------------------------
+-------+--------+------+------+
|   name|dob_year|gender|salary|
+-------+--------+------+------+
|    foo|    1981|     M|  2000|
|    bar|    1982|     M|  2500|
|    baz|    1983|     F|  3500|
|foo-bar|    1984|     M|  4000|
+-------+--------+------+------+

If your kafka producer is still running, you may enter a new row and you will see the new data in Batch: 1 and so on for each time you enter new data in producers.如果您的 kafka producer 仍在运行,您可以输入一个新行,并且每次在 producer 中输入新数据时,您都会在 Batch:1 中看到新数据,依此类推。

This is a typical example when we enter data from console producer and consume in spark console.这是我们从控制台生产者输入数据并在火花控制台中消费的典型示例。

Good luck: :)祝你好运: :)

just check your spark version with spark.version and adjust the packages as suggested in the other answers.只需使用spark.version检查您的 spark 版本,然后按照其他答案中的建议调整软件包。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ 运行 TwitterPopularTags 时 - java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ while running TwitterPopularTags 线程“main”中的异常 java.lang.NoClassDefFoundError: org/apache/spark/streaming/StreamingContext - Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/StreamingContext java.lang.NoClassDefFoundError:org / apache / spark / sql / DataFrame - java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame 使用Apache Spark 2.2.1的Spark流式传输-java.lang.NoClassDefFoundError:scala / xml / MetaData - Spark Streaming using Apache Spark 2.2.1- java.lang.NoClassDefFoundError: scala/xml/MetaData Spark Kafka流-java.lang.NoClassDefFoundError:akka / util / Helpers $ ConfigOps $ - Spark kafka streaming - java.lang.NoClassDefFoundError: akka/util/Helpers$ConfigOps$ spark-cassandra (java.lang.NoClassDefFoundError: org/apache/spark/sql/cassandra/package) - spark-cassandra (java.lang.NoClassDefFoundError: org/apache/spark/sql/cassandra/package) Spark java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2 - Spark java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/datasources/v2/FileDataSourceV2 启动琐碎的独立Spark应用程序时出现问题:java.lang.NoClassDefFoundError:org / apache / spark / sql / internal / StaticSQLConf $ - Problem starting trivial standalone spark app: java.lang.NoClassDefFoundError: org/apache/spark/sql/internal/StaticSQLConf$ Spark-线程“ main”中的异常java.lang.NoClassDefFoundError:org / apache / spark / sql / DataFrame - Spark - Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame Spark + Kafka 集成错误。 NoClassDefFoundError:org/apache/spark/sql/internal/connector/SimpleTableProvider - Spark + Kafka Integration error. NoClassDefFoundError: org/apache/spark/sql/internal/connector/SimpleTableProvider
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM