在Spark集群上运行程序时出现java.lang.ClassNotFoundException

Question

I have a spark scala program which loads a jar I wrote in java. 我有一个spark scala程序，它加载我用Java写的jar。 From that jar a static function is called, which tried to read a serialized object from a file ( Pattern.class ), but throws a java.lang.ClassNotFoundException . 从该jar中调用了一个静态函数，该函数试图从文件（ Pattern.class ）中读取序列化的对象，但是抛出java.lang.ClassNotFoundException 。 Running the spark program locally works, but on the cluster workers it doesn't. 在本地运行spark程序有效，但是在集群工作程序上无效。 It's especially weird because before I try to read from the file, I instantiate a Pattern object and there are no problems. 这特别奇怪，因为在尝试读取文件之前，我实例化了Pattern对象，并且没有问题。

I am sure that the Pattern objects I wrote in the file are the same as the Pattern objects I am trying to read. 我相信，在Pattern对象我在文件中是一样的写Pattern的对象我想读。

I've checked the jar in the slave machine and the Pattern class is there. 我已经检查了从机中的jar，并且其中有Pattern类。

Does anyone have any idea what the problem might be ? 有谁知道可能是什么问题？ I can add more detail if it's needed. 如果需要，我可以添加更多细节。

This is the Pattern class 这是Pattern类

public class Pattern implements Serializable {
private static final long serialVersionUID = 588249593084959064L;

public static enum RelationPatternType {NONE, LEFT, RIGHT, BOTH};
RelationPatternType type;
String entity;
String pattern;
List<Token> tokens;
Relation relation = null;

public Pattern(RelationPatternType type, String entity, List<Token> tokens, Relation relation) {
    this.type = type;
    this.entity = entity;
    this.tokens = tokens;
    this.relation = relation;
    if (this.tokens != null)
        this.pattern = StringUtils.join(" ", this.tokens.toString());
}

} }

I am reading the file from S3 the following way: 我正在以下列方式从S3中读取文件：

AmazonS3 s3Client = new AmazonS3Client(credentials);
S3Object confidentPatternsObject = s3Client.getObject(new GetObjectRequest("xxx","confidentPatterns"));
objectData = confidentPatternsObject.getObjectContent();
ois = new ObjectInputStream(objectData);
confidentPatterns = (Map<Pattern, Tuple2<Integer, Integer>>) ois.readObject();

LE: I checked the classpath at runtime and the path to the jar was not there. LE：我在运行时检查了类路径，但jar的路径不存在。 I added it for the executors but I still have the same problem. 我为执行者添加了它，但是我仍然遇到相同的问题。 I don't think that was it, as I have the Pattern class inside the jar that is calling the readObject function. 我不这么认为，因为在jar中有Pattern类，它正在调用readObject函数。

Answer 1

Would suggest this adding this kind method to find out the classpath resources before call, to make sure that everything is fine from caller's point of view 建议添加此方法以在调用之前找出类路径资源，以确保从调用者的角度来看一切都很好

public static void printClassPathResources() {
        final ClassLoader cl = ClassLoader.getSystemClassLoader();
        final URL[] urls = ((URLClassLoader) cl).getURLs();
        LOG.info("Print All Class path resources under currently running class");
        for (final URL url : urls) {
            LOG.info(url.getFile());
        }

    }

This is sample configuration spark 1.5 这是样本配置spark 1.5

--conf "spark.driver.extraLibrayPath=$HADOOP_HOME/*:$HBASE_HOME/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \\ --conf "spark.executor.extraLibraryPath=$HADOOP_HOME/*" \\ --conf "spark.executor.extraClassPath=$(echo /your directory of jars/*.jar | tr ' ' ',')

As described by this Trouble shooting guide : Class Not Found: Classpath Issues Another common issue is seeing class not defined when compiling Spark programs this is a slightly confusing topic because spark is actually running several JVM's when it executes your process and the path must be correct for each of them. 如本《 故障排除指南》所述： 找不到类：类路径问题另一个常见问题是，在编译Spark程序时看到未定义类，这是一个有点令人困惑的话题，因为spark在执行进程时实际上正在运行多个JVM，并且路径必须正确为他们每个人。 Usually this comes down to correctly passing around dependencies to the executors. 通常这归结为正确地将依赖项传递给了执行者。 Make sure that when running you include a fat Jar containing all of your dependencies, (I recommend using sbt assembly) in the SparkConf object used to make your Spark Context. 确保在运行时在用于创建Spark上下文的SparkConf对象中包含一个包含所有依赖项的胖jar（建议使用sbt程序集）。 You should end up writing a line like this in your spark application: 您应该最终在spark应用程序中编写如下代码：

val conf = new SparkConf().setAppName(appName).setJars(Seq(System.getProperty("user.dir") + "/target/scala-2.10/sparktest.jar"))

This should fix the vast majority of class not found problems. 这应该解决绝大多数班级未发现的问题。 Another option is to place your dependencies on the default classpath on all of the worker nodes in the cluster. 另一个选择是将依赖项放在群集中所有工作程序节点上的默认类路径上。 This way you won't have to pass around a large jar. 这样，您就不必绕过一个大罐子。

The only other major issue with class not found issues stems from different versions of the libraries in use. 类未发现的唯一其他主要问题是由于使用的库的版本不同而引起的。 For example if you don't use identical versions of the common libraries in your application and in the spark server you will end up with classpath issues. 例如，如果您在应用程序和Spark服务器中未使用相同版本的公共库，则最终会遇到类路径问题。 This can occur when you compile against one version of a library (like Spark 1.1.0) and then attempt to run against a cluster with a different or out of date version (like Spark 0.9.2). 当您针对某个版本的库（例如Spark 1.1.0）进行编译，然后尝试针对具有不同版本或过期版本（例如Spark 0.9.2）的集群运行时，可能会发生这种情况。 Make sure that you are matching your library versions to whatever is being loaded onto executor classpaths. 确保您将库版本与正在加载到执行程序类路径上的任何库版本匹配。 A common example of this would be compiling against an alpha build of the Spark Cassandra Connector then attempting to run using classpath references to an older version. 一个常见的示例是针对Spark Cassandra Connector的alpha版本进行编译，然后尝试使用对旧版本的类路径引用来运行。

在Spark集群上运行程序时出现java.lang.ClassNotFoundException

问题描述

1 个解决方案

解决方案1
1 2016-05-07 19:24:51

在Spark集群上运行程序时出现java.lang.ClassNotFoundException

问题描述

1 个解决方案

解决方案1 1 2016-05-07 19:24:51

解决方案1
1 2016-05-07 19:24:51