简体   繁体   中英

ValidationFailureSemanticException: Partition spec contains non-partition columns

I am trying a simple use case of inserting into a hive partitioned table on S3. I am running my code on zeppelin notebook on EMR and below is my code along with the screenshot of the output of the commands. I checked the schema of hive table and dataframe and there is no case difference in column name. I am getting below mentioned exception.

import org.apache.spark.sql.hive.HiveContext
import sqlContext.implicits._

System.setProperty("hive.metastore.uris","thrift://datalake-hive-server2.com:9083")
val hiveContext = new HiveContext(sc)
hiveContext.setConf("hive.exec.dynamic.partition", "true")
hiveContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")

spark.sql("""CREATE EXTERNAL TABLE employee_table (Emp_Id STRING, First_Name STRING, Salary STRING)  PARTITIONED BY (Month STRING)  LOCATION 's3n://dev-emr-jupyter/anup/' 
TBLPROPERTIES ("skip.header.line.count"="1") """)


val csv_df = spark.read
         .format("csv")
         .option("header", "true").load("s3n://dev-emr-jupyter/anup/test_data.csv")
    
import org.apache.spark.sql.SaveMode
csv_df.registerTempTable("csv")

spark.sql(""" INSERT OVERWRITE TABLE employee_table PARTITION(Month) select Emp_Id, First_Name, Salary, Month from csv""")

在此处输入图片说明

org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.Table.ValidationFailureSemanticException: Partition spec {month=, Month=May} contains non-partition columns; at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)

You need to put a command before your insert statement, in order to be able to populate a partition at runtime. By default, the dynamic partition mode is set to strict.

spark.sql("set hive.exec.dynamic.partition.mode=nonstrict")

Try by adding that line and run again.

Edit 1:

I saw in your attache image that when you do csv_df.show() you got your salary column to be the last, instead of month column. Try to reference your columns in the insert statement, like: insert into table_name partition(month) (column1, column2..)..

Florin

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM