Spark saveAsTable append saves data to hive but throws an error: org.apache.hadoop.hive.ql.metadata.Hive.alterTable

Question

I'm trying to append data into an existing table in hive.我正在尝试将 append 数据放入 hive 中的现有表中。 but when I call但是当我打电话时

sdf.write.format("parquet").mode("append").saveAsTable("db.tbl", path=hdfs_path)

Data is saved successfully, but I get this error:数据已成功保存，但出现此错误：

Py4JJavaError: An error occurred while calling o152.saveAsTable.
: java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.alterTable(java.lang.String, org.apache.hadoop.hive.ql.metadata.Table, org.apache.hadoop.hive.metastore.api.EnvironmentContext)
    at java.lang.Class.getMethod(Class.java:1786)
    at org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:177)
    at org.apache.spark.sql.hive.client.Shim_v2_1.alterTableMethod$lzycompute(HiveShim.scala:1183)
    at org.apache.spark.sql.hive.client.Shim_v2_1.alterTableMethod(HiveShim.scala:1177)
    at org.apache.spark.sql.hive.client.Shim_v2_1.alterTable(HiveShim.scala:1230)
    at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$alterTable$1(HiveClientImpl.scala:572)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
    at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
    at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
    at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
    at org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:562)
    at org.apache.spark.sql.hive.client.HiveClient.alterTable(HiveClient.scala:107)
    at org.apache.spark.sql.hive.client.HiveClient.alterTable$(HiveClient.scala:106)
    at org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:90)
    at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$alterTableStats$1(HiveExternalCatalog.scala:719)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:103)
    at org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:705)
    at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.alterTableStats(ExternalCatalogWithListener.scala:133)
    at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:420)
    at org.apache.spark.sql.execution.command.CommandUtils$.updateTableStats(CommandUtils.scala:63)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:198)
    at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:538)
    at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:219)
    at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:167)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121)
    at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)
    at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:727)
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:705)
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:603)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

I tried some alternatives too:我也尝试了一些替代方案：

sdf.write.insertInto("db.tbl",overwrite=False)
sdf.write.mode("append").insertInto("db.tbl")
spark.sql("insert into table value(...)")

But with the same issue.但同样的问题。 It looks like any attempt to add data to an existing table succeed with throwing that error.看起来任何将数据添加到现有表的尝试都会成功抛出该错误。

The "overwrite" mode is working good. “覆盖”模式运行良好。

The spark version I'm using is 3.0.1 The hive version I'm using is 3.1.0我使用的 spark 版本是 3.0.1 我使用的 hive 版本是 3.1.0

Have anyone faced this issue before?以前有人遇到过这个问题吗？

Answer 1

This look like some hive metastore artifacts referred in spark 3 is hive 2.x and not 3.x which you were using.这看起来像 spark 3 中提到的一些 hive 元存储工件是 hive 2.x 而不是您使用的 3.x。

Answer 2

You're definitely having wrong Hive jar in your environment:您的环境中肯定有错误的 Hive jar ：

Your spark is referring to Hive 3.x , which has this method alterTable(String, Table, EnvironmentContext)你的火花指的是Hive 3.x ，它有这个方法alterTable(String, Table, EnvironmentContext)
However according to your comment, you have hive-metastore-1.21.2.3.1.4.41-5.jar , which is under Hortonwork distribution and you can download the source code and verify yourself, that there is no such method.但是根据您的评论，您有hive-metastore-1.21.2.3.1.4.41-5.jar ，它在 Hortonwork 发行版下，您可以下载源代码并自己验证，没有这种方法。

Spark saveAsTable append saves data to hive but throws an error: org.apache.hadoop.hive.ql.metadata.Hive.alterTable

问题描述

2 个解决方案

解决方案1
1 2021-05-19 16:20:54

解决方案2
1 2021-05-21 20:26:31

Spark saveAsTable append saves data to hive but throws an error: org.apache.hadoop.hive.ql.metadata.Hive.alterTable

问题描述

2 个解决方案

解决方案1 1 2021-05-19 16:20:54

解决方案2 1 2021-05-21 20:26:31

解决方案1
1 2021-05-19 16:20:54

解决方案2
1 2021-05-21 20:26:31