[英]How to partition a non-partitioned table on Hive?
Given a table with 360 days of data, we want to partition it by date to improve performance. 给定一个具有360天数据的表,我们希望按日期对它进行分区以提高性能。 Do we need to use following SELECT command for each date?
我们是否需要为每个日期使用以下SELECT命令? Any more efficient way to do this?
还有更有效的方法吗?
INSERT INTO TABLE <new_table> Partition (dt='2015-07-01')
SELECT * from <table> WHERE dt='2015-07-01'
If your new table is partitioned by dt (date), you should use Dynamic Partition . 如果您的新表按dt(日期)进行了分区 ,则应使用动态分区 。 You dont need to specify the specific partition (in this case date).
您无需指定特定的分区(在这种情况下为日期)。 In this way Hive realize all different dates and it makes the partitions automatically.
通过这种方式,Hive可以实现所有不同的日期,并自动创建分区。
Remember set these flags: 记住设置以下标志:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
First make your table: 首先使您的表:
create db.my_table(column1 int, column2 string,
-- ...
)
comment 'I like paritioned tables'
partitioned by(dt string)
location '/path/to/file';
Now you can load the data into dt partitions: 现在您可以将数据加载到dt分区中:
insert overwrite into table db.my_table partition (dt) select * from other_table;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.