简体   繁体   中英

Spark saveAsTable append saves data to hive but throws an error: org.apache.hadoop.hive.ql.metadata.Hive.alterTable

I'm trying to append data into an existing table in hive. but when I call

sdf.write.format("parquet").mode("append").saveAsTable("db.tbl", path=hdfs_path)

Data is saved successfully, but I get this error:

Py4JJavaError: An error occurred while calling o152.saveAsTable.
: java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.alterTable(java.lang.String, org.apache.hadoop.hive.ql.metadata.Table, org.apache.hadoop.hive.metastore.api.EnvironmentContext)
    at java.lang.Class.getMethod(Class.java:1786)
    at org.apache.spark.sql.hive.client.Shim.findMethod(HiveShim.scala:177)
    at org.apache.spark.sql.hive.client.Shim_v2_1.alterTableMethod$lzycompute(HiveShim.scala:1183)
    at org.apache.spark.sql.hive.client.Shim_v2_1.alterTableMethod(HiveShim.scala:1177)
    at org.apache.spark.sql.hive.client.Shim_v2_1.alterTable(HiveShim.scala:1230)
    at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$alterTable$1(HiveClientImpl.scala:572)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
    at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
    at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
    at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
    at org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:562)
    at org.apache.spark.sql.hive.client.HiveClient.alterTable(HiveClient.scala:107)
    at org.apache.spark.sql.hive.client.HiveClient.alterTable$(HiveClient.scala:106)
    at org.apache.spark.sql.hive.client.HiveClientImpl.alterTable(HiveClientImpl.scala:90)
    at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$alterTableStats$1(HiveExternalCatalog.scala:719)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:103)
    at org.apache.spark.sql.hive.HiveExternalCatalog.alterTableStats(HiveExternalCatalog.scala:705)
    at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.alterTableStats(ExternalCatalogWithListener.scala:133)
    at org.apache.spark.sql.catalyst.catalog.SessionCatalog.alterTableStats(SessionCatalog.scala:420)
    at org.apache.spark.sql.execution.command.CommandUtils$.updateTableStats(CommandUtils.scala:63)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:198)
    at org.apache.spark.sql.execution.datasources.DataSource.writeAndRead(DataSource.scala:538)
    at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.saveDataIntoTable(createDataSourceTables.scala:219)
    at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:167)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
    at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175)
    at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121)
    at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
    at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
    at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963)
    at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:727)
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:705)
    at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:603)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)

I tried some alternatives too:

sdf.write.insertInto("db.tbl",overwrite=False)
sdf.write.mode("append").insertInto("db.tbl")
spark.sql("insert into table value(...)")

But with the same issue. It looks like any attempt to add data to an existing table succeed with throwing that error.

The "overwrite" mode is working good.

The spark version I'm using is 3.0.1 The hive version I'm using is 3.1.0

Have anyone faced this issue before?

This look like some hive metastore artifacts referred in spark 3 is hive 2.x and not 3.x which you were using.

You're definitely having wrong Hive jar in your environment:

  • Your spark is referring to Hive 3.x , which has this method alterTable(String, Table, EnvironmentContext)
  • However according to your comment, you have hive-metastore-1.21.2.3.1.4.41-5.jar , which is under Hortonwork distribution and you can download the source code and verify yourself, that there is no such method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM