All of sudden I am unable to read the hive external s3 table from spark, I noticed there are subfolders got created under few partitions.
I hope there is any parameter or setting can be configured so Hadoop doesn't create these subfolders.
when I manually delete subfolders from s3, I can read table. but need to find a way so these subfolders won't get created randomly in future.
CREATE EXTERNAL TABLE `mydb.mytable`(
`id` string COMMENT 'from deserializer',
`attribute_value` string COMMENT 'from deserializer',
`attribute_date` string COMMENT 'from deserializer',
`source_id` string COMMENT 'from deserializer')
PARTITIONED BY (`partition_source_id` int)
ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde'
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://path/my_data'
TBLPROPERTIES ('transient_lastDdlTime'='1567170767')
When I run select *
query I get:
error: java.io.IOException: Not a file: s3://my_path/partition_source_id=11/1 1 statement failed.
I don't think this DDL creates subfolders. If there is some job to load data into 's3://path/my_data' and executes DDL add partition on mydb.mytable, I think you should take a look the job.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.