如何在Hive上对未分区的表进行分区？

Question

Given a table with 360 days of data, we want to partition it by date to improve performance. 给定一个具有360天数据的表，我们希望按日期对它进行分区以提高性能。 Do we need to use following SELECT command for each date? 我们是否需要为每个日期使用以下SELECT命令？ Any more efficient way to do this? 还有更有效的方法吗？

INSERT INTO TABLE <new_table> Partition (dt='2015-07-01')
SELECT * from <table> WHERE dt='2015-07-01'

Answer 1

If your new table is partitioned by dt (date), you should use Dynamic Partition . 如果您的新表按dt（日期）进行了分区，则应使用动态分区。 You dont need to specify the specific partition (in this case date). 您无需指定特定的分区（在这种情况下为日期）。 In this way Hive realize all different dates and it makes the partitions automatically. 通过这种方式，Hive可以实现所有不同的日期，并自动创建分区。

Remember set these flags: 记住设置以下标志：

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

Answer 2

First make your table: 首先使您的表：

create  db.my_table(column1 int, column2 string,
                     -- ...
)
comment 'I like paritioned tables'
partitioned by(dt string)
location '/path/to/file';

Now you can load the data into dt partitions: 现在您可以将数据加载到dt分区中：

insert overwrite into table db.my_table partition (dt) select * from other_table;

如何在Hive上对未分区的表进行分区？

问题描述

2 个解决方案

解决方案1
1 2015-08-06 16:43:07

解决方案2
1 已采纳 2015-08-06 17:06:05

如何在Hive上对未分区的表进行分区？

问题描述

2 个解决方案

解决方案1 1 2015-08-06 16:43:07

解决方案2 1 已采纳 2015-08-06 17:06:05

解决方案1
1 2015-08-06 16:43:07

解决方案2
1 已采纳 2015-08-06 17:06:05