[英]java.lang.ClassNotFoundException when running program on spark cluster
I have a spark scala program which loads a jar I wrote in java. 我有一个spark scala程序,它加载我用Java写的jar。 From that jar a static function is called, which tried to read a serialized object from a file (
Pattern.class
), but throws a java.lang.ClassNotFoundException
. 从该jar中调用了一个静态函数,该函数试图从文件(
Pattern.class
)中读取序列化的对象,但是抛出java.lang.ClassNotFoundException
。 Running the spark program locally works, but on the cluster workers it doesn't. 在本地运行spark程序有效,但是在集群工作程序上无效。 It's especially weird because before I try to read from the file, I instantiate a
Pattern
object and there are no problems. 这特别奇怪,因为在尝试读取文件之前,我实例化了
Pattern
对象,并且没有问题。
I am sure that the Pattern
objects I wrote in the file are the same as the Pattern
objects I am trying to read. 我相信,在
Pattern
对象我在文件中是一样的写Pattern
的对象我想读。
I've checked the jar in the slave machine and the Pattern
class is there. 我已经检查了从机中的jar,并且其中有
Pattern
类。
Does anyone have any idea what the problem might be ? 有谁知道可能是什么问题? I can add more detail if it's needed.
如果需要,我可以添加更多细节。
This is the Pattern class 这是Pattern类
public class Pattern implements Serializable {
private static final long serialVersionUID = 588249593084959064L;
public static enum RelationPatternType {NONE, LEFT, RIGHT, BOTH};
RelationPatternType type;
String entity;
String pattern;
List<Token> tokens;
Relation relation = null;
public Pattern(RelationPatternType type, String entity, List<Token> tokens, Relation relation) {
this.type = type;
this.entity = entity;
this.tokens = tokens;
this.relation = relation;
if (this.tokens != null)
this.pattern = StringUtils.join(" ", this.tokens.toString());
}
} }
I am reading the file from S3 the following way: 我正在以下列方式从S3中读取文件:
AmazonS3 s3Client = new AmazonS3Client(credentials);
S3Object confidentPatternsObject = s3Client.getObject(new GetObjectRequest("xxx","confidentPatterns"));
objectData = confidentPatternsObject.getObjectContent();
ois = new ObjectInputStream(objectData);
confidentPatterns = (Map<Pattern, Tuple2<Integer, Integer>>) ois.readObject();
LE: I checked the classpath at runtime and the path to the jar was not there. LE:我在运行时检查了类路径,但jar的路径不存在。 I added it for the executors but I still have the same problem.
我为执行者添加了它,但是我仍然遇到相同的问题。 I don't think that was it, as I have the Pattern class inside the jar that is calling the readObject function.
我不这么认为,因为在jar中有Pattern类,它正在调用readObject函数。
Would suggest this adding this kind method to find out the classpath resources before call, to make sure that everything is fine from caller's point of view 建议添加此方法以在调用之前找出类路径资源,以确保从调用者的角度来看一切都很好
public static void printClassPathResources() {
final ClassLoader cl = ClassLoader.getSystemClassLoader();
final URL[] urls = ((URLClassLoader) cl).getURLs();
LOG.info("Print All Class path resources under currently running class");
for (final URL url : urls) {
LOG.info(url.getFile());
}
}
--conf "spark.driver.extraLibrayPath=$HADOOP_HOME/*:$HBASE_HOME/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \\ --conf "spark.executor.extraLibraryPath=$HADOOP_HOME/*" \\ --conf "spark.executor.extraClassPath=$(echo /your directory of jars/*.jar | tr ' ' ',')
val conf = new SparkConf().setAppName(appName).setJars(Seq(System.getProperty("user.dir") + "/target/scala-2.10/sparktest.jar"))
This should fix the vast majority of class not found problems. 这应该解决绝大多数班级未发现的问题。 Another option is to place your dependencies on the default classpath on all of the worker nodes in the cluster.
另一个选择是将依赖项放在群集中所有工作程序节点上的默认类路径上。 This way you won't have to pass around a large jar.
这样,您就不必绕过一个大罐子。
The only other major issue with class not found issues stems from different versions of the libraries in use. 类未发现的唯一其他主要问题是由于使用的库的版本不同而引起的。 For example if you don't use identical versions of the common libraries in your application and in the spark server you will end up with classpath issues.
例如,如果您在应用程序和Spark服务器中未使用相同版本的公共库,则最终会遇到类路径问题。 This can occur when you compile against one version of a library (like Spark 1.1.0) and then attempt to run against a cluster with a different or out of date version (like Spark 0.9.2).
当您针对某个版本的库(例如Spark 1.1.0)进行编译,然后尝试针对具有不同版本或过期版本(例如Spark 0.9.2)的集群运行时,可能会发生这种情况。 Make sure that you are matching your library versions to whatever is being loaded onto executor classpaths.
确保您将库版本与正在加载到执行程序类路径上的任何库版本匹配。 A common example of this would be compiling against an alpha build of the Spark Cassandra Connector then attempting to run using classpath references to an older version.
一个常见的示例是针对Spark Cassandra Connector的alpha版本进行编译,然后尝试使用对旧版本的类路径引用来运行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.