![](/img/trans.png)
[英]PySpark - NoClassDefFoundError: kafka/common/TopicAndPartition
[英]java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
在我的代码中执行以下命令时:
kafka_streams = [KafkaUtils.createStream(ssc, zk_settings['QUORUM'], zk_settings['CONSUMERS'][k],
{zk_settings['TOPICS'][0]: zk_settings['NUM_THREADS']})
.window(zk_settings['WINDOW_DURATION'], zk_settings['SLIDE_DURATION'])
for k in range(len(zk_settings['CONSUMERS']))]
但我收到以下错误:
Exception in thread "Thread-3" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2625)
at java.lang.Class.privateGetPublicMethods(Class.java:2743)
at java.lang.Class.getMethods(Class.java:1480)
at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:365)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:317)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
at py4j.Gateway.invoke(Gateway.java:252)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 12 more
我错过了什么吗?
我得到了一些火花错误,所以我重建了火花错误,导致了这个错误。
您应该在提交代码时添加--packages
。
./bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 <DIR>/main.py localhost:9092 test
https://spark.apache.org/docs/latest/streaming-kafka-0-8-integration.html
我遇到这个问题的原因是我下载的spark-streaming-kafka jar不是程序集 jar。 我通过以下方式解决了这个问题:
首先下载装配 spark-streaming-kafka...jar
wget https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-8-assembly_2.11/2.2.0/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar
对我来说,我使用的是spark-2.2.0,所以请尝试访问URL https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-8-assembly_2.11查看您需要下载的相应jar。
然后取
spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 --jars /path/to/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar myApp.py
classNotFoundException
意味着当你的程序spark-submit运行时,它无法在程序的运行目录中找到它所需的类kafka.common.TopicAndPartition
。
看一下spark-submit命令的用法:
# spark-submit --help
Usage: spark-submit [options] <app jar | python file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
--packages Comma-separated list of maven coordinates of jars to include
on the driver and executor classpaths. Will search the local
maven repo, then maven central and any additional remote
repositories given by --repositories. The format for the
coordinates should be groupId:artifactId:version.
使用kafka的本地jar路径添加--jars选项,如下所示:
# spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 --jars /path/to/org.apache.kafka_kafka_2.11-0.8.2.1.jar,/path/to/com.yammer.metrics_metrics-core-2.2.0.jar your_python_script.py
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.