简体   繁体   English

Apache Spark — 使用 spark-submit 会抛出 NoSuchMethodError

[英]Apache Spark — using spark-submit throws a NoSuchMethodError

To submit a Spark application to a cluster, their documentation notes:要将 Spark 应用程序提交到集群,其文档说明:

To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies.为此,请创建一个包含代码及其依赖项的程序集 jar(或“uber”jar)。 Both sbt and Maven have assembly plugins. sbt 和 Maven 都有程序集插件。 When creating assembly jars, list Spark and Hadoop as provided dependencies;创建汇编 jar 时,将 Spark 和 Hadoop 列为提供的依赖项; these need not be bundled since they are provided by the cluster manager at runtime.这些不需要捆绑,因为它们是由集群管理器在运行时提供的。 -- http://spark.apache.org/docs/latest/submitting-applications.html -- http://spark.apache.org/docs/latest/submitting-applications.html

So, I added the Apache Maven Shade Plugin to my pom.xml file.因此,我将 Apache Maven Shade Plugin 添加到我的pom.xml文件中。 (version 3.0.0) (版本 3.0.0)
And I turned my Spark dependency's scope into provided .我把我的 Spark 依赖的范围变成了provided (version 2.1.0) (版本 2.1.0)

(I also added the Apache Maven Assembly Plugin to ensure I was wrapping all of my dependencies in the jar when I run mvn clean package . I'm unsure if it's truly necessary.) (我还添加了 Apache Maven 程序集插件,以确保在运行mvn clean package时将所有依赖项包装在 jar 中。我不确定它是否真的有必要。)


Thus is how spark-submit fails.这就是spark-submit失败的原因。 It throws a NoSuchMethodError for a dependency I have (note that the code works from a local instance when compiling inside IntelliJ, assuming that provided is removed).它为我拥有的依赖项抛出NoSuchMethodError (请注意,代码在 IntelliJ 内部编译时从本地实例工作,假设provided已删除)。

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;

The line of code that throws the error is irrelevant--it's simply the first line in my main method that creates a Stopwatch , part of the Google Guava utilities.抛出错误的代码行无关紧要——它只是我的主要方法中的第一行,它创建了一个Stopwatch ,它是 Google Guava 实用程序的一部分。 (version 21.0) (版本 21.0)

Other solutions online suggest that it has to do with version conflicts of Guava, but I haven't had any luck yet with those suggestions.其他在线解决方案表明它与 Guava 的版本冲突有关,但我对这些建议还没有任何运气。 Any help would be appreciated, thank you.任何帮助将不胜感激,谢谢。

If you take a look at the /jars subdirectory of the Spark 2.1.0 installation, you will likely see guava-14.0.1.jar .如果您查看 Spark 2.1.0 安装的/jars子目录,您可能会看到guava-14.0.1.jar Per the API for the Guava Stopwatch#createStarted method you are using, createStarted did not exist until Guava 15.0.根据您使用的 Guava Stopwatch#createStarted方法API,在Guava 15.0 之前不存在createStarted What is most likely happening is that the Spark process Classloader is finding the Spark-provided Guava 14.0.1 library before it finds the Guava 21.0 library packaged in your uberjar.最有可能发生的情况是,Spark 进程类加载器在找到打包在您的 uberjar 中的 Guava 21.0 库之前先找到了 Spark 提供的 Guava 14.0.1 库。

One possible resolution is to use the class-relocation feature provided by the Maven Shade plugin (which you're already using to construct your uberjar).一种可能的解决方案是使用Maven Shade 插件提供类重定位功能(您已经使用它来构建您的 uberjar)。 Via "class relocation", Maven-Shade moves the Guava 21.0 classes (needed by your code) during the packaging of the uberjar from a pattern location reflecting their existing package name (eg com.google.common.base ) to an arbitrary shadedPattern location, which you specify in the Shade configuration (eg myguava123.com.google.common.base ).通过“类重定位”,Maven-Shade 在 uberjar 的打包过程中将 Guava 21.0 类(您的代码需要)从反映其现有包名称的pattern位置(例如com.google.common.base )移动到任意的shadedPattern位置,您在 Shade 配置中指定(例如myguava123.com.google.common.base )。

The result is that the older and newer Guava libraries no longer share a package name, avoiding the runtime conflict.结果是旧的和新的 Guava 库不再共享一个包名,避免了运行时冲突。

Most likely you're having a dependency conflict, yes.很可能你有依赖冲突,是的。

First you can look if you have a dependency conflict when you build your jar.首先,您可以查看构建 jar 时是否存在依赖项冲突。 A quick way is to look in your jar directly to see if the Stopwatch.class file is there, and if, by looking at the bytecode, it appears that the method createStarted is there.一种快速的方法是直接查看您的 jar 以查看 Stopwatch.class 文件是否存在,以及通过查看字节码,是否存在 createStarted 方法。 Otherwise you can also list the dependency tree and work from there : https://maven.apache.org/plugins/maven-dependency-plugin/examples/resolving-conflicts-using-the-dependency-tree.html否则,您还可以列出依赖树并从那里开始工作: https : //maven.apache.org/plugins/maven-dependency-plugin/examples/resolving-conflicts-using-the-dependency-tree.html

If it's not an issue with your jar, you might have a dependency issue due to a conflict between your spark installation and your jar.如果这不是 jar 的问题,则可能是由于 Spark 安装和 jar 之间的冲突导致的依赖性问题。 Look in the lib and jars folder of your spark installation.查看 Spark 安装的 lib 和 jars 文件夹。 There you can see if you have jars that include an alternate version of guava that wouldnt support the method createStarted() from Stopwatch在那里,您可以查看是否有包含不支持来自 Stopwatch 的 createStarted() 方法的番石榴替代版本的 jar

Apply above answers to solve the problem by following config:应用以上答案通过以下配置解决问题:

  <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>3.1.0</version>
    <executions>
      <execution>
        <phase>package</phase>
        <goals>
          <goal>shade</goal>
        </goals>
        <configuration>
            <relocations>
                <relocation>
                    <pattern>com.google.common</pattern>
                    <shadedPattern>shade.com.google.common</shadedPattern>
                </relocation>
                <relocation>
                    <pattern>com.google.thirdparty.publicsuffix</pattern>
                    <shadedPattern>shade.com.google.thirdparty.publicsuffix</shadedPattern>
                </relocation>
          </relocations>
        </configuration>
      </execution>
    </executions>
  </plugin>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM