[英]java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$ while running TwitterPopularTags
[英]Kafka Spark Streaming Error - java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/read/streaming/ReportsSourceMetrics
我正在使用Spark 3.1.2、Kafka 2.8.1 和 Scala 2.12.1
在集成 Kafka 和 Spark 流時遇到錯誤 -
java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/read/streaming/ReportsSourceMetrics
具有依賴關系的 Spark-shell 命令 - spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2
org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-3643b83d-a2f8-43d1-941f-a125272f3905;1.0
confs: [default]
found org.apache.spark#spark-sql-kafka-0-10_2.12;3.1.2 in central
found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.1.2 in central
found org.apache.kafka#kafka-clients;2.6.0 in central
found com.github.luben#zstd-jni;1.4.8-1 in central
found org.lz4#lz4-java;1.7.1 in central
found org.xerial.snappy#snappy-java;1.1.8.2 in central
found org.slf4j#slf4j-api;1.7.30 in central
found org.spark-project.spark#unused;1.0.0 in central
found org.apache.commons#commons-pool2;2.6.2 in central
:: resolution report :: resolve 564ms :: artifacts dl 9ms
:: modules in use:
com.github.luben#zstd-jni;1.4.8-1 from central in [default]
org.apache.commons#commons-pool2;2.6.2 from central in [default]
org.apache.kafka#kafka-clients;2.6.0 from central in [default]
org.apache.spark#spark-sql-kafka-0-10_2.12;3.1.2 from central in [default]
org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.1.2 from central in [default]
org.lz4#lz4-java;1.7.1 from central in [default]
org.slf4j#slf4j-api;1.7.30 from central in [default]
org.spark-project.spark#unused;1.0.0 from central in [default]
org.xerial.snappy#snappy-java;1.1.8.2 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 9 | 0 | 0 | 0 || 9 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-3643b83d-a2f8-43d1-941f-a125272f3905
confs: [default]
0 artifacts copied, 9 already retrieved (0kB/15ms)
21/12/28 17:46:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/12/28 17:46:28 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark context Web UI available at http://*******:4041
Spark context available as 'sc' (master = local[*], app id = local-1640693788919).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.1.2
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
Type in expressions to have them evaluated.
Type :help for more information.
val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "127.0.1.1:9092").option("subscribe", "Topic").option("startingOffsets", "earliest").load()
df.printSchema()
import org.apache.spark.sql.types._
val schema = new StructType().add("id",IntegerType).add("fname",StringType).add("lname",StringType)
val personStringDF = df.selectExpr("CAST(value AS STRING)")
val personDF = personStringDF.select(from_json(col("value"), schema).as("data")).select("data.*")
personDF.writeStream.format("console").outputMode("append").start().awaitTermination()
Exception in thread "stream execution thread for [id = 44e8f8bf-7d94-4313-9d2b-88df8f5bc10f, runId = 3b4c63c4-9062-4288-a681-7dd6cfb836d0]" java.lang.NoClassDefFoundError: org/apache/spark/sql/connector/read/streaming/ReportsSourceMetrics
我遇到了幾乎相同的問題——同樣的異常,但在spark-submit
中。 我通過將 Spark 升級到3.2.0
版本解決了這個問題。 我還使用了org.apache.spark:spark-sql-kafka-0-10_2.12
的3.2.0
版本,完整命令為:
$SPARK_HOME/bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.2.0 script.py
Spark_version 3.1.2
Scala_version 2.12.10
Kafka_version 2.8.1
注意:當我們將--packages org.apache.spark:spark-sql-kafka-0-10_2.12:VVV
與spark-shell
或spark-submit
一起使用時,版本非常重要。 其中(VVV = Spark_version)
我按照spark-kafka-example給出的以下步驟進行操作:
$ kafka-console-producer.sh --broker-list Kafka-Server-IP:9092 --topic kafka-spark-test
您應該在控制台上看到提示>
。 在生產者上輸入一些測試數據。
>{"name":"foo","dob_year":1995,"gender":"M","salary":2000}
>{"name":"bar","dob_year":1996,"gender":"M","salary":2500}
>{"name":"baz","dob_year":1997,"gender":"F","salary":3500}
>{"name":"foo-bar","dob_year":1998,"gender":"M","salary":4000}
spark-shell
如下:spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2
注意:我使用的是 3.1.2。 成功啟動后,您將看到類似以下內容:
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.1.2
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.13)
Type in expressions to have them evaluated.
Type :help for more information.
val df = spark.readStream.
format("kafka").
option("kafka.bootstrap.servers", "Kafka-Server-IP:9092").
option("subscribe", "kafka-spark-test").
option("startingOffsets", "earliest").
load()
df.printSchema()
scala> df.printSchema()
root
|-- key: binary (nullable = true)
|-- value: binary (nullable = true)
|-- topic: string (nullable = true)
|-- partition: integer (nullable = true)
|-- offset: long (nullable = true)
|-- timestamp: timestamp (nullable = true)
|-- timestampType: integer (nullable = true)
scala> val personStringDF = df.selectExpr("CAST(value AS STRING)")
personStringDF: org.apache.spark.sql.DataFrame = [value: string]
scala> val schema = new StructType().
| add("name",StringType).
| add("dob_year",IntegerType).
| add("gender",StringType).
| add("salary",IntegerType)
schema: org.apache.spark.sql.types.StructType = StructType(StructField(name,StringType,true), StructField(dob_year,IntegerType,true), StructField(gender,StringType,true), StructField(salary,IntegerType,true))
scala> val personDF = personStringDF.select(from_json(col("value"), schema).as("data")).select("data.*")
personDF: org.apache.spark.sql.DataFrame = [name: string, dob_year: int ... 2 more fields]
scala> personDF.writeStream.
| format("console").
| outputMode("append").
| start().
| awaitTermination()
您將看到以下 output:
-------------------------------------------
Batch: 0
-------------------------------------------
+-------+--------+------+------+
| name|dob_year|gender|salary|
+-------+--------+------+------+
| foo| 1981| M| 2000|
| bar| 1982| M| 2500|
| baz| 1983| F| 3500|
|foo-bar| 1984| M| 4000|
+-------+--------+------+------+
如果您的 kafka producer 仍在運行,您可以輸入一個新行,並且每次在 producer 中輸入新數據時,您都會在 Batch:1 中看到新數據,依此類推。
這是我們從控制台生產者輸入數據並在火花控制台中消費的典型示例。
祝你好運: :)
只需使用spark.version
檢查您的 spark 版本,然后按照其他答案中的建議調整軟件包。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.