简体   繁体   English

正确的Spark API Maven依赖关系

[英]Correct Maven Dependency for Spark API

I am wondering if the following dependency declarations are not enough to get access to the following class 我想知道以下依赖项声明是否不足以访问以下类

org.apache.spark.api.java.function.PairFunction

Error at runtime 运行时错误

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/PairFunction
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:278)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.api.java.function.PairFunction
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

Dependencies declared 依赖声明

  <dependencies>
  <dependency>
     <groupId>org.scala-lang</groupId>
     <artifactId>scala-library</artifactId>
     <version>2.12.1</version>
   </dependency>
   <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>2.1.0</version>
</dependency>
   <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>

I am running the .jar like so: 我像这样运行.jar:

hadoop jar target/secondarysortspark-1.0-SNAPSHOT.jar   ~/projects/secondarysortspark/secondarysortspark/src/main/java/com/tom/secondarysortspark/data.txt

Thanks 谢谢

For your spark version the scala version is not ok. 对于您的Spark版本,scala版本不适合。 I have the following properties and it works 我具有以下属性,并且可以正常工作

<properties>
        <scala.tools.version>2.10</scala.tools.version>
        <scala.version>2.10.4</scala.version>
        <spark.version>1.6.1</spark.version>
    </properties>

At least scala 2.10 is required, as for spark-core_2.10 declare on your spark core dependency(take a look here spark ) 至少需要scala 2.10,因为spark-core_2.10在您的spark核心依赖项上声明(请在此处查看spark

The interface PairFunction is in spark-core so your declared dependencies are fine. 接口PairFunction在spark-core中,因此您声明的依赖关系很好。

https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/api/java/function/PairFunction.java https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/api/java/function/PairFunction.java

The issue is that spark-core is not found on classpath during runtime. 问题是在运行时在classpath上找不到spark-core。

The problem can be solved in a variety of ways depending on your set-up but the easiest in your case is likely to just pass the jar using -libjars option of your hadoop command. 根据您的设置,可以通过多种方式解决该问题,但是在您的情况下,最简单的方法可能是使用hadoop命令的-libjars选项传递jar。

Try something like 尝试类似

hadoop jar target/secondarysortspark-1.0-SNAPSHOT.jar ~/projects/secondarysortspark/secondarysortspark/src/main/java/com/tom/secondarysortspark/data.txt -libjars path/to/spark-core.jar

Let me know the results. 让我知道结果。

We have two options here, do a spark-submit and give the dependency jars in the command as -jars. 我们在这里有两个选项,执行spark-submit并将命令中的依赖项jar命名为-jars。 Or else build a fat jar and do spark submit. 否则,建一个胖罐子,做火花提交。 Building the fat jar will fix the dependency issue. 构建胖子罐将解决依赖关系问题。

Eg. 例如。 spark-submit --master local -- class your_class jarfile_path --jars dependency_jars input_arguements

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM