简体   繁体   English

java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer 用于火花流

[英]java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer for spark streaming

I am trying to run a spark streaming application on Ubuntu, but I get some errors.我正在尝试在 Ubuntu 上运行 spark 流应用程序,但出现了一些错误。 For some reason Ubuntu 22.04 does not locate the jar files despite the fact the same configuration works on Windows.由于某种原因,Ubuntu 22.04 没有找到 jar 文件,尽管事实上相同的配置适用于 Windows。

I run the following configuration in a script我在脚本中运行以下配置

    spark = SparkSession \
        .builder \
        .appName("File Streaming PostgreSQL") \
        .master("local[3]") \
        .config("spark.streaming.stopGracefullyOnShutdown", "true") \
        .config("spark.jars.packages", "org.apache.spark:spark-avro_2.12:3.3.0,org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0") \
        .config("spark.sql.shuffle.partitions", 2) \
        .getOrCreate()

in addition to that, I download and locate all the jar files for avro and sql in /usr/local/spark/jars除此之外,我下载并在/usr/local/spark/jars中找到 avro 和 sql 的所有jar 文件

  • spark-sql-kafka-0-10_2.12-3.3.0.jar spark-sql-kafka-0-10_2.12-3.3.0.jar
  • spark-sql-kafka-0-10_2.12-3.3.0-tests.jar spark-sql-kafka-0-10_2.12-3.3.0-tests.jar
  • spark-sql-kafka-0-10_2.12-3.3.0-javadoc.jar spark-sql-kafka-0-10_2.12-3.3.0-javadoc.jar
  • spark-sql-kafka-0-10_2.12-3.3.0-sources.jar spark-sql-kafka-0-10_2.12-3.3.0-sources.jar
  • spark-sql-kafka-0-10_2.12-3.3.0-test-sources.jar spark-sql-kafka-0-10_2.12-3.3.0-test-sources.jar
  • spark-avro_2.12-3.3.0.jar spark-avro_2.12-3.3.0.jar
  • spark-avro_2.12-3.3.0-tests.jar spark-avro_2.12-3.3.0-tests.jar
  • spark-avro_2.12-3.3.0-javadoc.jar spark-avro_2.12-3.3.0-javadoc.jar
  • spark-avro_2.12-3.3.0-sources.jar spark-avro_2.12-3.3.0-sources.jar
  • spark-avro_2.12-3.3.0-test-sources.jar spark-avro_2.12-3.3.0-test-sources.jar

my spark version is 3.3.0 and scala version is 2.12.14 and OpenJDK 64-Bit Server VM, Java 11.0.16, but I get the following error我的 spark 版本是 3.3.0 和 scala 版本是 2.12.14 和 OpenJDK 64 位服务器 VM,Java 11.0.16,但我收到以下错误

ile "/usr/local/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o39.load.
: java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArraySerializer
        at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<init>(KafkaSourceProvider.scala:601)
        at org.apache.spark.sql.kafka010.KafkaSourceProvider$.<clinit>(KafkaSourceProvider.scala)
        at org.apache.spark.sql.kafka010.KafkaSourceProvider.org$apache$spark$sql$kafka010$KafkaSourceProvider$$validateStreamOptions(KafkaSourceProvider.scala:338)
        at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:71)
        at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:236)
        at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:118)
        at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:118)
        at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:34)
        at org.apache.spark.sql.streaming.DataStreamReader.loadInternal(DataStreamReader.scala:168)
        at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:144)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArraySerializer
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        ... 22 more

This problem persists despite that I have the following configuration on .bashrc尽管我在.bashrc上有以下配置,但此问题仍然存在

#configuration for local Spark and Hadoop
SPARK_HOME=/usr/local/spark-3.3.0-bin-hadoop3
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export PYSPARK_PYTHON=/usr/bin/python3
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin

It is important to notice that everything works fine on Windows, and other applications that do not make use of SQL injection and avro serialization work just fine on Ubuntu 22.04 as well.重要的是要注意,在 Windows 上一切正常,而其他不使用 SQL 注入和 avro 序列化的应用程序在 Ubuntu 上也能正常工作。

You need kafka-clients.jar for the mentioned class.对于提到的 class,您需要kafka-clients.jar

You don't need Spark tests, sources or Javadoc in your Spark runtime.在 Spark 运行时中不需要 Spark 测试、源代码或 Javadoc。

Also, if you're trying to use Avro with a Schema Registry, then spark-avro isn't what you want.此外,如果您尝试将 Avro 与 Schema Registry 一起使用,那么 spark-avro 不是您想要的。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 结构化流卡夫卡火花 java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging - Structured Streaming kafka spark java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging Spark和Kafka问题-线程“ main”中的异常java.lang.NoClassDefFoundError:org.apache.spark.streaming.kafka010.LocationStrategies - Spark and Kafka issue - Exception in thread “main” java.lang.NoClassDefFoundError: org.apache.spark.streaming.kafka010.LocationStrategies Flink 作业中的 java.lang.NoClassDefFoundError: org/apache/flink/streaming/connectors/rabbitmq/common/RMQConnectionConfig - java.lang.NoClassDefFoundError: org/apache/flink/streaming/connectors/rabbitmq/common/RMQConnectionConfig in flink job Spark因org.apache.kafka.common.serialization.StringDeserializer的NoClassDefFoundError而失败 - Spark fails with NoClassDefFoundError for org.apache.kafka.common.serialization.StringDeserializer java.lang.NoClassDefFoundError:org / apache / kafka / clients / producer / Producer - java.lang.NoClassDefFoundError: org/apache/kafka/clients/producer/Producer solr java.lang.NoClassDefFoundError:org / apache / solr / common / ResourceLoader - solr java.lang.NoClassDefFoundError: org/apache/solr/common/ResourceLoader java.lang.NoClassDefFoundError-org / apache / spark / sql / hive / HiveContext - java.lang.NoClassDefFoundError - org/apache/spark/sql/hive/HiveContext kafka 连接到 Google BigQuery 抛出错误 java.lang.NoClassDefFoundError: org/apache/kafka/common/config/ConfigDef$CaseInsensitiveValidString - kafka connect to Google BigQuery throws error java.lang.NoClassDefFoundError: org/apache/kafka/common/config/ConfigDef$CaseInsensitiveValidString java.lang.NoClassDefFoundError:org / apache / spark / SparkConf - java.lang.NoClassDefFoundError: org/apache/spark/SparkConf java.lang.NoClassDefFoundError: org/apache/spark/Logging - java.lang.NoClassDefFoundError: org/apache/spark/Logging
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM