蜂巢：从现有分区表创建新表

Question

I'm using Amazon's Elastic MapReduce and I have a hive table created based on a series of log files stored in Amazon S3 and split in folders by day like so: 我使用的是Amazon的Elastic MapReduce，并且我有一个配置单元表，该配置表是基于存储在Amazon S3中的一系列日志文件创建的，并按天在文件夹中拆分，如下所示：

data/day=2011-09-01/log_file.tsv 数据/天= 2011-09-01 / log_file.tsv
data/day=2011-09-02/log_file.tsv 数据/天= 2011-09-02 / log_file.tsv

I am currently trying to create an additional table which filters out some unwanted activity in these log files but I can't figure out how to do this and keep getting errors such as: 我目前正在尝试创建一个附加表，以过滤掉这些日志文件中的一些不必要的活动，但是我不知道如何执行此操作并不断出现错误，例如：

FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned.

If my initial table create statement looks something like this: 如果我的初始表create语句如下所示：

CREATE EXTERNAL TABLE IF NOT EXISTS table1 (
... fields ...
)
PARTITIONED BY ( DAY STRING )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bucketname/data/';

That initial table works fine and I've been able to query it with no problems. 该初始表工作正常，并且我可以毫无问题地对其进行查询。

How then should I create a new table that shares the structure of the previous one but simply filters out data? 那我该如何创建一个新表，该表共享上一个表的结构，但只过滤掉数据？ This doesn't seem to work. 这似乎不起作用。

CREATE EXTERNAL TABLE IF NOT EXISTS table2 LIKE table1;

FROM table1
INSERT OVERWRITE TABLE table2
SELECT * WHERE
col1 = '%somecriteria%' AND
more criteria...
;

As I've stated above, this returns: 如上所述，这返回：

FAILED: Error in semantic analysis: need to specify partition columns because the destination table is partitioned.

Thanks! 谢谢！

Answer 1

This always works for me: 这总是对我有用：

CREATE EXTERNAL TABLE IF NOT EXISTS table2 LIKE table1;
INSERT OVERWRITE TABLE table2 PARTITION (day) SELECT col1, col2, ..., day FROM table1;
ALTER TABLE table2 RECOVER PARTITIONS;

Notice that I've added 'day' as a column in the SELECT statement. 注意，我已经在“ SELECT”语句中将“ day”添加为一列。 Also notice that there is an ALTER TABLE line which is necessary for Hive to become aware of the partitions that were newly created in table2. 还要注意，有一条ALTER TABLE行对于Hive知道table2中新创建的分区是必需的。

Answer 2

I have never used the like option.. so thanks for showing me that. 我从未使用过like选项..非常感谢您向我展示。 Will that actually create all of the partitions that the first table has as well? 这样是否会真正创建第一个表具有的所有分区？ If not, that could be the issue. 如果没有，那可能就是问题所在。 You could try using dynamic partitions : 您可以尝试使用动态分区 ：

create external table if not exists table2 like table1;
insert overwrite table table2 partition(part) select col1, col2 from table1;

Might not be the best solution, as I think you have to specify your columns in the select clause (as well as the partition column in the partition clause ). 可能不是最佳解决方案，因为我认为您必须在select clause （以及partition clause的partition列）中指定列。

And, you must turn on dynamic partitioning. 并且，您必须打开动态分区。

I hope this helps. 我希望这有帮助。

蜂巢：从现有分区表创建新表

问题描述

2 个解决方案

解决方案1
1 2013-08-30 21:30:39

解决方案2
0 2011-12-09 12:19:51

蜂巢：从现有分区表创建新表

问题描述

2 个解决方案

解决方案1 1 2013-08-30 21:30:39

解决方案2 0 2011-12-09 12:19:51

解决方案1
1 2013-08-30 21:30:39

解决方案2
0 2011-12-09 12:19:51