简体   繁体   English

通过Pig脚本删除蜂巢表分区

[英]drop hive table partition through pig script

Currently we are dropping the table daily and running the script which loads the data to the tables. 当前,我们每天删除表并运行将数据加载到表的脚本。 Script takes 3-4 hrs during which data will not be available. 脚本需要3-4个小时,在此期间数据将不可用。 So now our aim is to make the old hive data available to analysts until new data load execution is complete. 因此,现在我们的目标是使分析人员可以使用旧的配置单元数据,直到完成新的数据加载执行为止。

I am achieving this thing in hql script by loading daily data to the hive tables partitioned on load_year, load_month and load_day and dropping the yesterdays data by dropping the partition. 我通过将每日数据加载到在load_year,load_month和load_day上分区的配置单元表中,并通过删除分区来删除昨天的数据,从而在hql脚本中实现了这一目标。 But what is the option for pig script to achieve the same? 但是猪脚本实现相同的选择是什么? Can we alter the table through pig script? 我们可以通过猪脚本来更改表格吗? I dont want to execute the other hql to drop partition after pig. 我不想执行另一个hql来删除猪之后的分区。 Thanks 谢谢

Since HDP 2.3 you can use HCatalog commands inside Pig scripts. 从HDP 2.3开始,您可以在Pig脚本内使用HCatalog命令。 Therefore, you can use the HCatalog command to drop a Hive table partition. 因此,您可以使用HCatalog命令删除Hive表分区。 The following is an example of dropping a Hive partition: 以下是删除Hive分区的示例:

-- Set the correct hcat path 
set hcat.bin /usr/bin/hcat;
-- Drop a table partion or execute other any Hcatalog command
sql ALTER TABLE midb1.mitable1 DROP IF EXISTS PARTITION(activity_id = "VENTA_ALIMENTACION",transaction_month = 1);

Another way is to use sh command execution inside Pig Script. 另一种方法是在Pig脚本中使用sh命令执行。 However I had some problems to escape special characters in ALTER commands. 但是,我遇到了一些问题,无法在ALTER命令中转义特殊字符。 So, the first is the best option in my opinion. 因此,我认为第一个是最好的选择。

Regards, Roberto Tardío 问候,罗伯托·塔迪奥

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM