简体   繁体   English

蜂巢中具有分区的外部表

[英]external table with partitions in hive

I have a bunch of tsv files in HDFS in a directory structure that follows the partition convention where an event_dt is the partition. 我在HDFS中的目录结构中有一堆tsv文件,该目录结构遵循分区约定,其中event_dt是分区。

some_path/event_dt=2017-04-30
some_path/event_dt=2017-05-01

and so on. 等等。

The issue is that event_dt is also one of the columns. 问题是event_dt也是列之一。 The second one in particular. 特别是第二个。 But I cannot specify so since event_dt cannot appear in the table schema and in the PARTITIONED BY statement. 但是我无法指定,因为event_dt不能出现在表模式和PARTITIONED BY语句中。 That triggers: 触发:

 Column repeated in partitioning columns

Is there a way around this other than using different names. 除了使用不同的名称之外,还有其他方法吗? It is, after all, the same information. 毕竟,它是相同的信息。

3 options if you dont want to rename the column. 3个选项,如果您不想重命名列。

  1. If your event_dt is the last column in your csv, you create the table excluding this column. 如果event_dt是csv中的最后一列,则创建不包含此列的表。
  2. During the ingestion process exclude this information of your data, transforming the data from one place to another where the target table is partitioned by even_dt (not the most efficient way) 在提取过程中,请排除数据的此信息,然后将数据从一个位置转换到另一个位置,在该位置,目标表由even_dt分区(这不是最有效的方式)
  3. create a view on top of your table excluding one of the columns, anyway the original table will need the rename . 在您的表格顶部创建一个视图(不包括其中一列),否则原始表格将需要重命名。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM