简体   繁体   English

如何使用动态生成的分区值将Pig脚本的输出插入到配置单元外部表中?

[英]How to insert the output of a pig script into hive external tables using a dynamically generated partition value?

I have written a pig script that would generate tuples of a hive table. 我写了一个猪脚本,它将生成一个配置单元表的元组。 I am trying to dump the results to a specific partition in HDFS where hive stores the table date. 我正在尝试将结果转储到HDFS的特定分区中,此位置是蜂巢存储表日期。 As of now the partition value I am using is a timestamp string value that is generated inside pigscript. 到目前为止,我正在使用的分区值是在Pigscript内部生成的时间戳记字符串值。 I have to use this timestamp string value to store my pig script results but i am have no idea how to do that. 我必须使用此时间戳字符串值来存储我的Pig脚本结果,但是我不知道该怎么做。 Any help would be greatly appreciated. 任何帮助将不胜感激。

If I understand it right you read some data from a partition of a HIVE table and want to store into another HIVE table partitions, right? 如果我理解正确,那么您可以从HIVE表的分区中读取一些数据,并想存储到另一个HIVE表分区中,对吗? A HIVI partition (form HDFS perspective) is just a subfolder which name is constructed like this: fieldname_the_partitioning_is_based_on=value For example you have a date partition it looks like this: hdfs_to_your_hive_table/date=20160607/ HIVI分区(从HDFS角度来看)只是其子文件夹,其名称的结构如下:fieldname_the_partitioning_is_based_on = value例如,您有一个日期分区,如下所示:hdfs_to_your_hive_table / date = 20160607 /

So all you need is to specify this output location in the store statement 因此,您只需在store语句中指定此输出位置

STORE mydata INTO '$HIVE_DB.$TABLE' USING org.apache.hive.hcatalog.pig.HCatStorer('date=$today'); 使用org.apache.hive.hcatalog.pig.HCatStorer('date = $ today')将mydata存储到'$ HIVE_DB。$ TABLE'中;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM