蜂巢中具有分区的外部表

Question

I have a bunch of tsv files in HDFS in a directory structure that follows the partition convention where an event_dt is the partition. 我在HDFS中的目录结构中有一堆tsv文件，该目录结构遵循分区约定，其中event_dt是分区。

some_path/event_dt=2017-04-30
some_path/event_dt=2017-05-01

and so on. 等等。

The issue is that event_dt is also one of the columns. 问题是event_dt也是列之一。 The second one in particular. 特别是第二个。 But I cannot specify so since event_dt cannot appear in the table schema and in the PARTITIONED BY statement. 但是我无法指定，因为event_dt不能出现在表模式和PARTITIONED BY语句中。 That triggers: 触发：

 Column repeated in partitioning columns

Is there a way around this other than using different names. 除了使用不同的名称之外，还有其他方法吗？ It is, after all, the same information. 毕竟，它是相同的信息。

Answer 1

3 options if you dont want to rename the column. 3个选项，如果您不想重命名列。

If your event_dt is the last column in your csv, you create the table excluding this column. 如果event_dt是csv中的最后一列，则创建不包含此列的表。
During the ingestion process exclude this information of your data, transforming the data from one place to another where the target table is partitioned by even_dt (not the most efficient way) 在提取过程中，请排除数据的此信息，然后将数据从一个位置转换到另一个位置，在该位置，目标表由even_dt分区（这不是最有效的方式）
create a view on top of your table excluding one of the columns, anyway the original table will need the rename . 在您的表格顶部创建一个视图（不包括其中一列），否则原始表格将需要重命名。

蜂巢中具有分区的外部表

问题描述

1 个解决方案

解决方案1
-1 2017-06-09 04:38:00

蜂巢中具有分区的外部表

问题描述

1 个解决方案

解决方案1 -1 2017-06-09 04:38:00

解决方案1
-1 2017-06-09 04:38:00