简体   繁体   English


[英]Hive Create Table Partitions from file name

New to Hadoop. Hadoop的新手。 I know how to create a table in Hive (Syntax) Creating a table with 3 Partition Key. 我知道如何在Hive(语法)中创建表。使用3 Partition Key创建表。 but the keys are in File Names. 但是键在“文件名”中。

FileName Example : ServerName_ApplicationName_ApplicationName.XXXX.log.YYYY-MM-DD FileName示例:ServerName_ApplicationName_ApplicationName.XXXX.log.YYYY-MM-DD

there are hundreds of file in a directory want to create a table with following Partition Keys from file Name :ServerName,ApplicationName,Date and load all the files in to table Hive Script would be the preference but open to any other ideas 目录中有数百个文件想要从文件名:ServerName,ApplicationName,Date创建具有以下分区键的表,并将所有文件加载到表中Hive Script将是首选项,但可以接受其他想法

(files are CSV. and I know The schema(column definitions) of the file ) (文件是CSV。我知道文件的架构(列定义))

I assume the File Name is in format ServerName_ApplicationName.XXXX.log.YYYY-MM-DD (removed second "applicationname" assuming it to be a typo). 我假设文件名的格式为ServerName_ApplicationName.XXXX.log.YYYY-MM-DD(删除第二个“应用程序名称”,假设它是拼写错误)。

Create a table on the contents of the original file. 在原始文件的内容上创建一个表。 Some thing like.. 就像是..

create external table default.stack
(col1 string,
 col2 string,
 col3 string,
 col4 int,
 col5 int
 FIELDS terminated  by ','
 STORED AS INPUTFORMAT                                                  
 location 'hdfs://nameservice1/location1...';

Create another partitioned table in another location like 在另一个位置创建另一个分区表,例如

create external table default.stack_part
(col1 string,
 col2 string,
 col3 string,
 col4 int,
 col5 int
 PARTITIONED BY ( servername string, applicationname string, load_date string)
 STORED as AVRO  -- u can choose any format for the final file
 location 'hdfs://nameservice1/location2...';

Insert into partitioned table from base table using below query: 使用以下查询从基本表插入分区表:

set hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.compress.output=true;
set hive.exec.parallel=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;

Insert overwrite table default.stack_part
partition ( servername, applicationname, load_date)
select *, 
       split(reverse(split(reverse(INPUT__FILE__NAME),"/")[0]),"_")[0] as servername
       ,split(split(reverse(split(reverse(INPUT__FILE__NAME),"/")[0]),"_")[1],'[.]')[0] as applicationname
       ,split(split(reverse(split(reverse(INPUT__FILE__NAME),"/")[0]),"_")[1],'[.]')[3] as load_date
from default.stack;

I have tested this and it works. 我已经对此进行了测试,并且有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM