简体   繁体   中英

Cannot view data in hive partition table

I have a external table that has a partitioned column called rundate. I can load data into the table using

DataFrame.write.mode(SaveMode.Overwrite).orc("s3://test/table")

I then create a partition using

spark.sql("ALTER TABLE table ADD IF NOT EXISTS PARTITION(rundate = '2017-12-19')")

The code works fine and i can see the partitions. But I cannot see data in the Hive table.

You have not saved the partition data in correct folder structure and also manually added the partition where data does not exist.

Two things: 1. First make sure you are saving at data at the location where external table is created and also the folder structure is same as hive expect. eg Assume your external table name is table and partition column is rundate , partition value is 2017-12-19 and external table is pointing to location s3://test/table . Then save data for partition 2017-12-19 as below:

DataFrame.write.mode(SaveMode.Overwrite).orc("s3://test/table/rundate=2017-12-19/")

2.Once save is successful below command to update the metastore of hive with the latest added partition.

synatx: msck repair table <tablename>
msck repair table table

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM