简体   繁体   English

纱线群集模式下Spark作业的ClassNotFoundException

[英]ClassNotFoundException for Spark job on Yarn-cluster mode

So I am trying to run a Spark job on Yarn-cluster mode kicked off via Oozie workflow, but have been encountering the following error (relevant stacktrace below) 所以我试图在通过Oozie工作流启动的Yarn-cluster模式下运行Spark作业,但是遇到了以下错误(下面的相关stacktrace)

java.sql.SQLException: ERROR 103 (08004): Unable to establish connection.
    at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:388)
    at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:296)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:179)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1917)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1896)
    at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1896)
    at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:180)
    at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:132)
    at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:151)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    ...
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
    at org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:414)
    at org.apache.hadoop.hbase.client.ConnectionManager.createConnectionInternal(ConnectionManager.java:323)
    at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:144)
    at org.apache.phoenix.query.HConnectionFactory$HConnectionFactoryImpl.createConnection(HConnectionFactory.java:47)
    at org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:294)
    ... 28 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
    ... 33 more
Caused by: java.lang.UnsupportedOperationException: Unable to find org.apache.hadoop.hbase.ipc.controller.ClientRpcControllerFactory
    at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:36)
    at org.apache.hadoop.hbase.ipc.RpcControllerFactory.instantiate(RpcControllerFactory.java:58)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.createAsyncProcess(ConnectionManager.java:2317)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:688)
    at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:630)
    ... 38 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.ipc.controller.ClientRpcControllerFactory
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at org.apache.hadoop.hbase.util.ReflectionUtils.instantiateWithCustomCtor(ReflectionUtils.java:32)
    ... 42 more

Some background information: 一些背景信息:

  • The job runs on spark 1.4.1 (specified correct spark.yarn.jar field in the spark.conf file). 作业在spark 1.4.1上运行(在spark.conf文件中指定了正确的spark.yarn.jar字段)。
  • oozie.libpath is set to the hdfs directory in which the jar of my program resides. oozie.libpath设置为程序jar所在的hdfs目录。
  • org.apache.hadoop.hbase.ipc.controller.ClientRpcControllerFactory, the class not found, exists in phoenix-4.5.1-HBase-1.0-client.jar. phoenix-4.5.1-HBase-1.0-client.jar中存在org.apache.hadoop.hbase.ipc.controller.ClientRpcControllerFactory,找不到该类。 I've specified this jar in spark.driver.extraClassPath and spark.executor.extraClassPath in my spark.conf file. 我已经在spark.conf文件的spark.driver.extraClassPath和spark.executor.extraClassPath中指定了这个jar。 I've also added the phoenix-core dependency in my pom file, so that the class exists in my shaded project jar as well. 我还在pom文件中添加了phoenix-core依赖项,因此该类也存在于我的阴影项目jar中。

Observations so far: 到目前为止的观察结果:

  • adding an extra field in my spark.conf file spark.driver.userClassPathFirst and setting it to true gets rid of the classnotfound exception. 在我的spark.conf文件spark.driver.userClassPathFirst中添加一个额外的字段并将其设置为true摆脱了classnotfound异常。 However, it also prevents me from initializing a spark context (null pointer exception). 但是,这也阻止了我初始化spark上下文(空指针异常)。 From googling around it seems that including this field messes up classpaths, so may not be the way to go about it since I cannot even initialize a spark context this way. 从谷歌搜索看来,包括该字段会弄乱类路径,因此可能不是解决方法,因为我什至无法以这种方式初始化spark上下文。
  • I noticed that in the oozie stdout log, I do not see the classpath of the phoenix jar. 我注意到在oozie stdout日志中,没有看到凤凰罐的类路径。 So maybe for some reason spark.driver.extraClassPath and spark.executor.extraClassPath aren't actually picking up the jar as an extraClassPath? 因此,也许出于某种原因,spark.driver.extraClassPath和spark.executor.extraClassPath实际上并没有把jar当作extraClassPath吗? I do know that I'm specifying the correct jar file path, as other jobs have spark.conf files with the same parameters. 我确实知道我指定了正确的jar文件路径,因为其他作业具有带有相同参数的spark.conf文件。
  • I found a hacky way to make the phoenix jar show up in the classpath (in the oozie stdout log) by copying the jar to the same directory as where my program jar resides. 我发现了一种通过将罐子复制到程序罐子所在目录的方法来使凤凰罐子显示在类路径中(在oozie stdout日志中)。 This works whether or not spark.executor.extraClassPath is changed to point to the new jar location. 无论spark.executor.extraClassPath是否更改为指向新的jar位置,此方法都有效。 However, the classnotfound exception persists, even though I clearly see the ClientRpcControllerFactory jar when I unzip the jar) 但是,即使解压缩jar时我清楚地看到ClientRpcControllerFactory jar,classnotfound异常仍然存在。

Other things I've tried: 我尝试过的其他方法:

  • I tried using the sparkConf.setJars() and sparkContext.addJar() methods, but still encountered the same error 我尝试使用sparkConf.setJars()和sparkContext.addJar()方法,但仍然遇到相同的错误
  • added the jar in the spark.driver.extraClassPath field in my job properties file, but it hasn't seemed to help (Spark docs indicated that this field is necessary when running in client mode, so may not be relevant for my case) 在我的作业属性文件的spark.driver.extraClassPath字段中添加了jar,但似乎没有帮助(Spark文档指示在客户端模式下运行时此字段是必需的,因此可能与我的情况无关)

Any help/ideas/suggestions would be greatly appreciated. 任何帮助/想法/建议将不胜感激。

I use CDH 5.5.1 + Phoenix 4.5.2 (both installed with parcels) and faced the same problem. 我使用CDH 5.5.1 + Phoenix 4.5.2(均安装有包裹),并且遇到相同的问题。 I think the problem disappeared after I switched to client mode. 我认为切换到客户端模式后问题消失了。 I can't verify this because I am getting other error with cluster mode now. 我无法验证这一点,因为现在群集模式出现其他错误。

I tried to trace Phoenix source code and found some interesting things. 我试图跟踪Phoenix源代码,发现了一些有趣的东西。 Hope Java / Scala expert identify the root cause. 希望Java / Scala专家找出根本原因。

  1. The PhoenixDriver class was loaded. PhoenixDriver类已加载。 This showed the jar was found initially. 这表明罐子最初是被发现的。 After layers of Class Loader / context switch (?), the jar lost from the classpath. 经过Class Loader /上下文切换(?)的层后,jar从类路径中丢失。
  2. If I Class.forName() a non-existing class in my program, there is no need to call sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) . 如果我Class.forName()是程序中不存在的类,则无需调用sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) The stack is like: 堆栈就像:

     java.lang.ClassNotFoundException: NONEXISTINGCLASS at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) 
  3. I copied Phoenix code into my program for testing. 我将Phoenix代码复制到程序中进行测试。 I still get the ClassNotFoundExcpetion if I call ConnectionQueryServicesImpl.init (ConnectionQueryServicesImpl.java:1896) . 如果我调用ConnectionQueryServicesImpl.init (ConnectionQueryServicesImpl.java:1896)我仍然会收到ClassNotFoundExcpetion However, a call to ConnectionQueryServicesImpl.openConnection (ConnectionQueryServicesImpl.java:296) returned usable HBase connection. 但是,对ConnectionQueryServicesImpl.openConnection (ConnectionQueryServicesImpl.java:296)的调用返回了可用的HBase连接。 So it seems PhoenixContextExecutor was causing the loss of the jar, but I don't know how. 因此,似乎PhoenixContextExecutor导致了jar的丢失,但我不知道如何。

Source code of Cloudera Phoenix 4.5.2 : https://github.com/cloudera-labs/phoenix/blob/phoenix1-4.5.2_1.2.0/phoenix-core/src/main/java/org/apache/ Cloudera Phoenix 4.5.2的源代码: https : //github.com/cloudera-labs/phoenix/blob/phoenix1-4.5.2_1.2.0/phoenix-core/src/main/java/org/apache/

(Not sure whether I should post a comment... but I have no reputation anyway) (不确定我是否应该发表评论...但是我仍然没有声誉)

So I managed to fix my issue and get my job to run. 因此,我设法解决了问题并开始工作。 My solution is very hacky, but will post it here in case it may help others in the future. 我的解决方案非常笨拙,但会在此处发布,以防将来可能对其他人有所帮助。

Basically, the problem as I understand it was that the org.apache.hadoop.hbase.util.ReflectionUtils class, which is responsible for finding the ClientRpcControllerFactory class, was being loaded from some cloudera directory in the cluster instead of from my own jar. 基本上,据我了解的问题是,负责查找ClientRpcControllerFactory类的org.apache.hadoop.hbase.util.ReflectionUtils类是从群集中的某个cloudera目录而不是从我自己的jar中加载的。 When I set spark.driver.userClassPathFirst to true, it prioritized loading the ReflectionUtils class from my jar, and so was able to location the ClientRpcControllerFactory class. 当我将spark.driver.userClassPathFirst设置为true时,它优先考虑从我的jar中加载ReflectionUtils类,因此能够定位ClientRpcControllerFactory类。 But that messed up some other classpaths and kept giving me a NullPointerException when I tried to initialize a SparkContext, so I looked for another solution. 但这弄乱了其他一些类路径,并在尝试初始化SparkContext时一直给我NullPointerException,所以我寻找了另一个解决方案。

I tried to figure out if it was possible to exclude all default cdh jars from being included in my classpath, but found that the value in spark.yarn.jar was pulling in all these cdh jars, and I definitely needed to specify that jar. 我试图弄清楚是否可以将所有默认的cdh jar从我的类路径中排除,但是发现spark.yarn.jar中的值正在拉入所有这些cdh jar,因此我绝对需要指定该jar。

So the solution was to include all classes under org.apache.hadoop.hbase from the Phoenix jar into spark-assembly jar (the jar that spark.yarn.jar pointed to), which got rid of the original exception and did not give me a NPE when trying to initialize a SparkContext. 因此解决方案是将Phoenix罐中org.apache.hadoop.hbase下的所有类都包含到spark-assembly jar(spark.yarn.jar指向的jar)中,该类摆脱了原始异常并且没有给我尝试初始化SparkContext时的NPE。 I found that now the ReflectionUtils class was being loaded from the spark-assembly jar, and since the ClientRpcControllerFactory was also included in that jar, it was able to find it. 我发现现在从火花组装jar中加载了ReflectionUtils类,并且由于ClientRpcControllerFactory也包含在该jar中,因此它能够找到它。 After this, I encountered a few more classNotFoundExceptions for Phoenix classes, so I put those classes into the spark-assembly jar as well. 此后,我又遇到了Phoenix类的更多classNotFoundExceptions,因此我也将这些类放入了火花装配罐中。

Finally, I had a java.lang.RuntimeException: hbase-default.xml File Seems to be for and old Version of HBase problem. 最后,我有一个java.lang.RuntimeException: hbase-default.xml File Seems to be for and old Version of HBase问题。 I found that my application jar contained such a file, but changing hbase.defaults.for.version.skip to true didn't do anything. 我发现我的应用程序jar中包含这样的文件,但是将hbase.defaults.for.version.skip更改为true并没有做任何事情。 So I included another hbase-default.xml file in the spark-assembly jar with the skip flag to true, and it finally worked. 因此,我在spark-assembly jar中包含了另一个hbase-default.xml文件,并将skip标志设置为true,它终于起作用了。

Some observations: 一些观察:

  • I noticed that my spark-assembly jar was completely missing an org.apache.hadoop.hbase directory. 我注意到我的火花装配jar完全缺少org.apache.hadoop.hbase目录。 A coworker told me that usually I should expect to find an hbase directory in my spark-assembly jar, so maybe I was working with a bad spark-assembly jar. 一位同事告诉我,通常我应该期望在我的Spark-Assembly Jar中找到一个hbase目录,所以也许我正在使用一个错误的Spark-Assembly Jar。 Edit: I checked a spark-assembly jar that I newly downloaded (v1.5.2) and it doesn't have it, so maybe the apache.hadoop.hbase package is not included in it. 编辑:我检查了新下载的Spark-Assembly jar(v1.5.2),它没有它,所以也许apache.hadoop.hbase包不包含在其中。
  • ClassNotFoundExceptions and classloader problems are hard to debug. ClassNotFoundExceptions和类加载器问题很难调试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM