简体   繁体   English

针对聚合列的Hive查询优化在select语句中出现一次

[英]Hive query optimization for aggregated columns appear once in a select statement

if there are multi aggregated column in one select, would the be evaluated only once? 如果一次选择中有多个汇总列,则将只评估一次? for example: 例如:

select
    date,
    count(userid) as uv,
    sum(isclick) as clickcnt,
    count(userid) / sum(isclick) as ctr
from
    user_access_log
group by
    1

here both count(userid) and sum(isclick) are used twice, would they be evaluated twice or only once, will hive do any query optimization? 这里count(userid)sum(isclick)都使用两次,它们将被评估两次还是仅评估一次,hive是否会进行任何查询优化?

This is too long for a comment. 这个评论太长了。

It doesn't make a difference. 没关系。 The expense of running an aggregation query is almost entirely in bringing the rows for groups together. 运行聚合查询的开销几乎完全是将各组的行放在一起。 For the most part, the aggregations themselves are not expensive. 在大多数情况下,聚合本身并不昂贵。

The one exception is count(distinct) (well, distinct with any form). 唯一的例外是count(distinct) (当然, distinct任何形式)。 This requires a bunch more overhead. 这需要更多的开销。

If you really want to run the aggregations only once, you can use a subquery: 如果您确实只想运行一次聚合,则可以使用子查询:

select ual.*, (uv / clickcnt) as ctr
from (select date, count(userid) as uv, sum(isclick) as clickcnt,
      from user_access_log
      group by 1
     ) ual;

To be honest, I suspect that you actually want count(distinct userid) , so this might give a small improvement in performance. 老实说,我怀疑您实际上想要count(distinct userid) ,所以这可能会在性能上有一点改进。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM