![](/img/trans.png)
[英]When Spark call Hive from oozie, exception raised “java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.HiveException”
[英]org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions exception when writing a hive partitioned table from spark(2.11) dataframe
我有这种奇怪的行为,我的用例是通过使用以下命令将Spark数据帧写入配置单元分区表
sqlContext.sql("INSERT OVERWRITE TABLE <table> PARTITION (<partition column) SELECT * FROM <temp table from dataframe>")
奇怪的是,这在从主机A使用pyspark shell时有效,但是在jupyter笔记本中使用相同的精确代码,连接到相同的群集,使用相同的配置单元表却不起作用,它将返回:
java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions
在我看来,异常是因为启动pyspark shell的主机与启动jupyter的主机之间存在一些jar不匹配,我的问题是,我如何确定pyspark shell中将使用哪个版本的相应jar,以及按代码显示jupyter笔记本(我无法访问jupyter服务器)? 如果pyspark shell和jupyter都连接到同一集群,为什么要使用2个不同的版本?
更新 :经过一些研究,我发现jupyter正在使用“ Livy”,并且Livy主机使用hive-exec-2.0.1.jar,我们使用pyspark shell的主机使用hive-exec-1.2.1000.2.5.3.58-3.jar ,所以我从maven存储库下载了两个jar并进行了反编译,发现两者中都存在全部loadDynamicPartitions方法,方法签名(参数)有所不同,在livy版本中缺少boolean holdDDLTime参数。
我有类似的问题尝试从cloudera获取Maven依赖项
<dependencies>
<!-- Scala and Spark dependencies -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.0-cdh5.9.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.0-cdh5.9.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.6.0-cdh5.9.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.1.0-cdh5.9.2</version>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.10</artifactId>
<version>3.0.0-SNAP4</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.10</artifactId>
<version>1.4.1</version>
</dependency>
<dependency>
<groupId>commons-dbcp</groupId>
<artifactId>commons-dbcp</artifactId>
<version>1.2.2</version>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-xml_2.10</artifactId>
<version>0.2.0</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.0.12</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.11.172</version>
</dependency>
<dependency>
<groupId>com.github.scopt</groupId>
<artifactId>scopt_2.10</artifactId>
<version>3.2.0</version>
</dependency>
<dependency>
<groupId>javax.mail</groupId>
<artifactId>mail</artifactId>
<version>1.4</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>maven-hadoop</id>
<name>Hadoop Releases</name>
<url>https://repository.cloudera.com/content/repositories/releases/</url>
</repository>
<repository>
<id>cloudera-repos</id>
<name>Cloudera Repos</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.