So, I have a table that data partitioned by datetime(dt) and stored in S3 which the partition look like this
dt=2019-03-22/
dt=2019-03-23/
dt=2019-03-24/
and so on, What I wanted to do is to change how I partition data from this pattern into a subpartition like this
year=2019/month=03/day=22/
year=2019/month=03/day=23/
year=2019/month=03/day=24/
But I don't want to alter the original table so I created an external table that point to another location in S3 which will be the location for this new partition pattern. I have tried creating a table that point to that location using (the same schema as the original one)
CREATE EXTERNAL TABLE `test_partition_new`(
`order_id` string,
`outlet_code` string,
.
.
.
.
`business_date` string,
.
.
.
.
)
PARTITIONED BY (
`year` string,
`month` string,
`day` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
's3://data-test/test_partition/db.new_partition/'
TBLPROPERTIES (
'orc.compress'='SNAPPY',
)
which will partition by year, month and day respectively. So from what I understand I should insert data from the original table into this one. How should I insert data into this new table which a date to be partition by are from column 'business_date' that contain data like '2019-03-20'. Are there any function that can separate this column into three column containing year, month and day
If the date format is consistent, you can split them into 3 columns and load.
INSERT INTO `test_partition_new` PARTITION(year,month,day)
SELECT --cols to select
,SPLIT(business_date,'-')[0] --year
,SPLIT(business_date,'-')[1] --month
,SPLIT(business_date,'-')[2] --day
FROM ORIGINAL_TABLE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.