[英]How to merge small files from existing partitions in hive?
How to merge existing Partition small files into one large file in one of the Partition . 如何将现有的分区小文件合并到一个分区中的一个大文件中。
For example I have a table user1, it contain columns fname,lname and partition column is day. 例如,我有一个表user1,它包含列fname,lname和分区列是天。
I have created table by using below script 我使用下面的脚本创建了表
CREATE TABLE user1(fname string,lname string) parittioned By (day int); CREATE TABLE user1(fname string,lname string)parittioned By(day int);
After inserting data into partion table it will look like below. 将数据插入分区表后,它将如下所示。
fname lname day
.....................
AA AAA 20170201 ....>partition 20170201
BB BBB 20170201
...................
CC CCC 20170202 ......>partition 20170202
DD DDD 20170202
....................
EE EEE 20170203 .......>partition 20170203
FF FFF 20170203
.......................
GG GGG 20170204 ........>partition 20170204
HH HHH 20170204
.......................
When I execute select query with the help of partition column ie day=20170201. 当我在分区列的帮助下执行选择查询,即day = 20170201。
select * from user1 where day=20170201;
It will give result like below 它将给出如下结果
AA AAA 20170201
BB BBB 20170201
based on above table i want to merge the all small files ie day =20170201 and day =20170202 and day=20170203 into partition day=20170203 in my partition table (ie USer1).ie It should look like below. 基于上面的表我想将所有小文件,即day = 20170201和day = 20170202以及day = 20170203合并到我的分区表(即USer1)中的分区日= 20170203 .ie它应该如下所示。
fname lname day
.....................
AA AAA 20170201
BB BBB 20170201
CC CCC 20170202
DD DDD 20170202
E EEE 20170203 .......>partition 20170203
FF FFF 20170203
.......................
GG GGG 20170204 ........>partition 20170204
HH HHH 20170204
.......................
can you please suggest on this,How can I achieve this? 你能就此提出建议吗?我怎样才能做到这一点?
Thanks in Advance. 提前致谢。
partition_day
: 创建由新字段partition_day
新表: CREATE TABLE user_new(fname string,lname string, day int) parittioned By (partition_day int);
case
) 将数据加载到新表中(在case
定义新分区的case
) insert overwrite table user_new partition (partition_day) select fname,lname, day, case when day <= 20170203 then 20170203 when day > 20170203 then 20170204 end as partition_day from user1 ;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.