简体   繁体   English

创建HIVE分区表HDFS位置帮助

[英]Create HIVE partitioned table HDFS location assistance

Sure hope someone can help me out with creating external HIVE partitioned tables by automatically adding data based on comma delimited files residing in an HDFS directory. 当然希望有人可以通过自动添加基于驻留在HDFS目录中的逗号分隔文件的数据来帮助我创建外部HIVE分区表。 My understanding, or lack thereof, is that when you define a CREATE EXTERNAL TABLE, PARTITIONED, and providing it with a LOCATION, it should recursively scan/read each and every sub-directory, and load data into the newly create partitioned external table. 我的理解或缺乏理解是,当您定义CREATE EXTERNAL TABLE,PARTITIONED并为其提供LOCATION时,它应递归扫描/读取每个子目录,并将数据加载到新创建的分区外部表中。 The following should provide some additional insight into my troubles… 以下内容应该为我的麻烦提供一些额外的见解......

Sample HDFS directory structure:<br>
/data/output/dt=2014-01-01<br>
/data/output/dt=2014-01-02<br>
/data/output/dt=2014-01-03   ...<br>
/data/output/dt=2014-05-21<br>

And each 'dt=' sub-directory contains the delimited file. 每个'dt ='子目录都包含分隔文件。

The following is an example of my CREATE EXTERNAL TABLE syntax: 以下是我的CREATE EXTERNAL TABLE语法的示例:

    CREATE EXTERNAL TABLE master_test (UID string,
    lname string,
    fname string,
    addr string,
    city string,
    state string,
    orderdate string,
    shipdate string)

    PARTITIONED BY (dt STRING)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS TEXTFILE
    LOCATION '/data/output/';

Upon the creation of my master_test external table, I would have thought that all of my delimited files would have already been contained within the table upon the CREATE. 在创建master_test外部表后,我原本认为我的所有分隔文件都已包含在CREATE中的表中。 The only way for me to get data into the newly defined external table is through an ALTER TABLE ADD PARTITION statement, for example: ALTER TABLE master_test ADD PARTITION (dt='2014-04-16'). 我将数据导入新定义的外部表的唯一方法是通过ALTER TABLE ADD PARTITION语句,例如:ALTER TABLE master_test ADD PARTITION(dt ='2014-04-16')。 Or, if I explicitly define the location of the delimited file, it will add the individual file the defined table. 或者,如果我明确定义分隔文件的位置,它将添加定义表的单个文件。

Any suggestions or guidance is greatly appreciated. 非常感谢任何建议或指导。

You can use MSCK REPAIR to automatically discover the partitions. 您可以使用MSCK REPAIR自动发现分区。 Take a look at the doc: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE) 看一下doc: https//cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM