[英]Correct Maven Dependency for Spark API
I am wondering if the following dependency declarations are not enough to get access to the following class 我想知道以下依赖项声明是否不足以访问以下类
org.apache.spark.api.java.function.PairFunction
Error at runtime 运行时错误
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/PairFunction
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.api.java.function.PairFunction
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
Dependencies declared 依赖声明
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
I am running the .jar like so: 我像这样运行.jar:
hadoop jar target/secondarysortspark-1.0-SNAPSHOT.jar ~/projects/secondarysortspark/secondarysortspark/src/main/java/com/tom/secondarysortspark/data.txt
Thanks 谢谢
For your spark version the scala version is not ok. 对于您的Spark版本,scala版本不适合。 I have the following properties and it works
我具有以下属性,并且可以正常工作
<properties>
<scala.tools.version>2.10</scala.tools.version>
<scala.version>2.10.4</scala.version>
<spark.version>1.6.1</spark.version>
</properties>
At least scala 2.10 is required, as for spark-core_2.10 declare on your spark core dependency(take a look here spark ) 至少需要scala 2.10,因为spark-core_2.10在您的spark核心依赖项上声明(请在此处查看spark )
The interface PairFunction is in spark-core so your declared dependencies are fine. 接口PairFunction在spark-core中,因此您声明的依赖关系很好。
https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/api/java/function/PairFunction.java https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/api/java/function/PairFunction.java
The issue is that spark-core is not found on classpath during runtime. 问题是在运行时在classpath上找不到spark-core。
The problem can be solved in a variety of ways depending on your set-up but the easiest in your case is likely to just pass the jar using -libjars option of your hadoop command. 根据您的设置,可以通过多种方式解决该问题,但是在您的情况下,最简单的方法可能是使用hadoop命令的-libjars选项传递jar。
Try something like 尝试类似
hadoop jar target/secondarysortspark-1.0-SNAPSHOT.jar ~/projects/secondarysortspark/secondarysortspark/src/main/java/com/tom/secondarysortspark/data.txt -libjars path/to/spark-core.jar
Let me know the results. 让我知道结果。
We have two options here, do a spark-submit and give the dependency jars in the command as -jars. 我们在这里有两个选项,执行spark-submit并将命令中的依赖项jar命名为-jars。 Or else build a fat jar and do spark submit.
否则,建一个胖罐子,做火花提交。 Building the fat jar will fix the dependency issue.
构建胖子罐将解决依赖关系问题。
Eg. 例如。
spark-submit --master local -- class your_class jarfile_path --jars dependency_jars input_arguements
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.