简体   繁体   中英

Table created with saveAsTable behaves differently than a table created with spark.sql("CREATE TABLE....)

My periodically running process writes data to a table over parquet files with the configuration "spark.sql.sources.partitionOverwriteMode" = "dynamic" with the following code:

if (!tableExists) {
  df.write
    .mode("overwrite")
    .partitionBy("partitionCol")
    .format("parquet")
    .saveAsTable("tablename")
}
else {
  df.write
    .format("parquet")
    .mode("overwrite")
    .insertInto("table")
}

If the table doesn't exist and is created in the first clause, it works fine and on the next run when the table does exist and the else clause runs it works as expected.

However, when I create the table over existing parquet files either through a hive session or using spark.sql("CREATE TABLE...") and then run the process it fails to write with the error:

org.apache.spark.SparkException: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

Adding this configuration to the spark conf solves the issue but I don't understand why it is needed when creating the table through a command but isn't needed when creating the table with saveAsTable .

Also, I don't understand how this configuration is relevant for spark. From what I've read , static partition here means we directly specify the partition to write into instead of specifying the column to partition by. Is it even possible to do such an insert in spark (as opposed to HiveQL)?

Spark 2.4, Hadoop 3.1

Below 2 settings are not exactly same.

 hive.exec.dynamic.partition.mode

https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties

spark.sql.sources.partitionOverwriteMode

https://spark.apache.org/docs/latest/configuration.html

For spark it controls if partitions will be deleted before insert or not.

For hive setting this is used to control syntax in insert. In case of strict mode it needs at least one static partition and in nonstrict all partitions can be dynamic.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM