简体   繁体   English

删除小于给定字符串的 kudu 范围分区

[英]Deleting kudu range partitions less than the given string

I want to delete all Kudu RANGE partitions from the kudud table which has partition value less than a given date string.我想从分区值小于给定日期字符串的 kudud 表中删除所有 Kudu RANGE 分区。 I am using following query but it's not working.我正在使用以下查询,但它不起作用。 Can someone please suggest what is the workaround.有人可以建议解决方法是什么。

alter table test_table drop if exists range partition values < '2010-01-31';

My Impala version is 2.6x and it appears to not work with '<' comparison.我的 Impala 版本是 2.6x,它似乎不适用于“<”比较。 I can't use '=' because as it's going to be done dynamically and i need one single query to wipe off all empty kudu partitions before the passed date string.我不能使用“=”,因为它将动态完成,我需要一个查询来清除传递日期字符串之前的所有空 kudu 分区。

I think that on your version you cannot use such syntax, looks like this feature was added in Impala 2.8我认为在你的版本上你不能使用这样的语法,看起来这个功能是在 Impala 2.8 中添加的

Docu 文档

To drop or alter multiple partitions:要删除或更改多个分区:

In Impala 2.8 and higher, the expression for the partition clause with a DROP or SET operation can include comparison operators such as <, IN, or BETWEEN, and Boolean operators such as AND and OR.在 Impala 2.8 及更高版本中,具有 DROP 或 SET 操作的分区子句的表达式可以包括比较运算符,例如 <、IN 或 BETWEEN,以及布尔运算符,例如 AND 和 OR。

For example, you might drop a group of partitions corresponding to a particular date range after the data "ages out":例如,您可能会在数据“老化”后删除一组对应于特定日期范围的分区:

alter table historical_data drop partition (year < 1995); alter table historical_data drop partition (year < 1995); alter table historical_data drop partition (year = 1996 and month between 1 and 6); alter table historical_data drop partition (year = 1996 and month between 1 and 6);

For tables with multiple partition keys columns, you can specify multiple conditions separated by commas, and the operation only applies to the partitions that match all the conditions (similar to using an AND clause):对于有多个分区键列的表,可以指定多个条件,以逗号分隔,只对满足所有条件的分区进行操作(类似于使用AND子句):

alter table historical_data drop partition (year < 1995, last_name like 'A%'); alter table historical_data drop partition (year < 1995, last_name like 'A%');

This technique can also be used to change the file format of groups of partitions, as part of an ETL pipeline that periodically consolidates and rewrites the underlying data files in a different file format:此技术还可用于更改分区组的文件格式,作为 ETL 管道的一部分,该管道定期合并并以不同的文件格式重写底层数据文件:

alter table fast_growing_data partition (year = 2016, month in (10,11,12)) set fileformat parquet; alter table fast_growing_data partition (year = 2016, month in (10,11,12)) set fileformat parquet;

Here is ticket in which it was added if you want to take a look: Jira issue如果您想看一下,这是添加它的票证: Jira issue

Not sure how to handle it, maybe you can write some script/spark shell code which will list all partitions and choose only does one which you want and concat them into one query which your Impala can handle不确定如何处理它,也许您可以编写一些脚本/spark shell 代码,它将列出所有分区并只选择一个您想要的分区并将它们连接到一个您的 Impala 可以处理的查询中

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM