Spark saveAsTable append saves data to hive but throws an error: org.apache.hadoop.hive.ql.metadata.Hive.alterTable

Question

I'm trying to append data into an existing table in hive. but when I call

sdf.write.format("parquet").mode("append").saveAsTable("db.tbl", path=hdfs_path)

Data is saved successfully, but I get this error:

Py4JJavaError: An error occurred while calling o152.saveAsTable.
: java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.alterTable(java.lang.String, org.apache.hadoop.hive.ql.metadata.Table, org.apache.hadoop.hive.metastore.api.EnvironmentContext)
    at java.lang.Class.getMethod(Class.java:1786)
    at org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:177)
    at org.apache.spark.sql.hive.client.Shim_v2_1.alterTableMethod$lzycompute(HiveShim.scala:1183)
    at org.apache.spark.sql.hive.client.Shim_v2_1.alterTableMethod(HiveShim.scala:1177)
    at org.apache.spark.sql.hive.client.Shim_v2_1.alterTable(HiveShim.scala:1230)
    at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$alterTable$1(HiveClientImpl.scala:572)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
    at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
    at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
    at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
    at org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:562)
    at org.apache.spark.sql.hive.client.HiveClient.alterTable(HiveClient.scala:107)
    at org.apache.spark.sql.hive.client.HiveClient.alterTable$(HiveClient.scala:106)
    at org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:90)
    at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$alterTableStats$1(HiveExternalCatalog.scala:719)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:103)
    at org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:705)
    at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.alterTableStats(ExternalCatalogWithListener.scala:133)
    at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:420)
    at org.apache.spark.sql.execution.command.CommandUtils$.updateTableStats(CommandUtils.scala:63)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:198)
    at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:538)
    at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:219)
    at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:167)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121)
    at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)
    at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:727)
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:705)
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:603)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

I tried some alternatives too:

sdf.write.insertInto("db.tbl",overwrite=False)
sdf.write.mode("append").insertInto("db.tbl")
spark.sql("insert into table value(...)")

But with the same issue. It looks like any attempt to add data to an existing table succeed with throwing that error.

The "overwrite" mode is working good.

The spark version I'm using is 3.0.1 The hive version I'm using is 3.1.0

Have anyone faced this issue before?

Answer 1

This look like some hive metastore artifacts referred in spark 3 is hive 2.x and not 3.x which you were using.

Answer 2

You're definitely having wrong Hive jar in your environment:

Your spark is referring to Hive 3.x , which has this method alterTable(String, Table, EnvironmentContext)
However according to your comment, you have hive-metastore-1.21.2.3.1.4.41-5.jar , which is under Hortonwork distribution and you can download the source code and verify yourself, that there is no such method.

Spark saveAsTable append saves data to hive but throws an error: org.apache.hadoop.hive.ql.metadata.Hive.alterTable

Question

2 answers

solution1
1 2021-05-19 16:20:54

solution2
1 2021-05-21 20:26:31

Spark saveAsTable append saves data to hive but throws an error: org.apache.hadoop.hive.ql.metadata.Hive.alterTable

Question

2 answers

solution1 1 2021-05-19 16:20:54

solution2 1 2021-05-21 20:26:31

solution1
1 2021-05-19 16:20:54

solution2
1 2021-05-21 20:26:31