Hive 分区查询正在扫描所有分区

Question

当我编写如下所示的 hive 查询时

select count(*)
from order
where order_month >= '2016-11';

Stage-1 的 Hadoop 作业信息：映射器数量：5； 减速机数量：1

我只有 5 个映射器，这意味着仅读取所需的分区（2016-11 和 2016-12）

我使用函数编写的相同查询

select count(*)
from order
where order_month >= concat(year(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10)),'-',month(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10)));

笔记：

concat(year(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10)),'-',month(DATE_SUB(to_date(from_unixtime(UNIX_TIMESTAMP())),10))) = '2016-11'

Stage-1 的 Hadoop 作业信息：映射器数量：216； 减速机数量：1

这次它正在读取所有分区{即 2004-10 到 2016-12}。 .

如何修改查询以仅读取所需的分区。

Answer 1

unix_timestamp()函数是不确定的，并且会阻止查询的正确优化 - 自 2.0 以来已被弃用，以支持CURRENT_TIMESTAMP和CURRENT_DATE 。

使用current_date，也不需要分别计算年和月：

where order_month >= substr(date_sub(current_date, 10),1,7)

Hive 分区查询正在扫描所有分区

问题描述

1 个解决方案

解决方案1
0 2020-02-12 06:57:18

Hive 分区查询正在扫描所有分区

问题描述

1 个解决方案

解决方案1 0 2020-02-12 06:57:18

解决方案1
0 2020-02-12 06:57:18