简体   繁体   English

SQL窗口函数:返回相同avg()多次的性能影响?

[英]SQL window functions: Performance impact of returning the same avg() many times?

I would like to SELECT a bunch of rows from table A, along with the results of aggregate functions like avg(A.price) and avg(A.distance). 我想从表A中选择一堆行,以及像avg(A.price)和avg(A.distance)这样的聚合函数的结果。

Now, the SELECT query takes a good bit of time, so I don't want to run one query to get the rows, and other to get the averages. 现在,SELECT查询需要花费大量时间,所以我不想运行一个查询来获取行,而另一个查询需要获取平均值。 If I did that, I'd be running the query to SELECT the appropriate rows twice. 如果我这样做,我将运行查询以选择适当的行两次。

But looking at the PostgreSQL window function documentation ( http://www.postgresql.org/docs/9.1/static/tutorial-window.html ), it seems that using window function to return the results of the aggregate functions I want to use alongside the returned rows means that every single row returned would contain the results of the aggregate functions. 但是看看PostgreSQL窗口函数文档( http://www.postgresql.org/docs/9.1/static/tutorial-window.html ),似乎使用window函数返回我想要使用的聚合函数的结果与返回的行一起表示返回的每一行都包含聚合函数的结果。 And in my case, since the aggregation is over all the rows returned by the main SELECT query and not a subset of its rows, this seems wasteful. 在我的情况下,由于聚合超过了主SELECT查询返回的所有行而不是其行的子集,这看起来很浪费。

What are the performance implications of returning the same avg() many times, given that I'm selecting a subset of the rows in A but doing aggregate queries across the entire subset? 考虑到我在A中选择行的子集但是在整个子集中进行聚合查询,多次返回相同avg()的性能影响是什么? In particular, does Postgres recompute the average every time, or does it cache the average somehow? 特别是,Postgres每次都会重新计算平均值,还是以某种方式缓存平均值?

By way of analogy: If you look at the window function docs and pretend that depname is 'develop' for every row returned by the SELECT query, and that the average is the same for every row because the average was computed across all returned rows. 通过类比:如果你查看窗口函数文档并假装depname对于SELECT查询返回的每一行都是'develop',并且每行的平均值是相同的,因为平均值是在所有返回的行中计算的。 How many times is that average computed? 平均计算多少次?

You can use a CTE to do what you want. 您可以使用CTE执行您想要的操作。 According to the Postgres documentation : 根据Postgres 文档

A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. WITH查询的一个有用属性是,每次执行父查询时,它们仅被评估一次,即使父查询或兄弟WITH查询多次引用它们也是如此。 Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. 因此,可以在WITH查询中放置多个位置所需的昂贵计算,以避免冗余工作。 Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. 另一种可能的应用是防止对具有副作用的功能进行不必要的多重评估。 However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query. 但是,这个硬币的另一面是优化器不太可能将限制从父查询推送到WITH查询而不是普通的子查询。 The WITH query will generally be evaluated as stated, without suppression of rows that the parent query might discard afterwards. WITH查询通常将按照说明进行评估,而不会抑制父查询之后可能丢弃的行。 (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.) (但是,如上所述,如果查询的引用仅需要有限数量的行,则评估可能会提前停止。)

You can structure you final results using a structure such as: 您可以使用以下结构构建最终结果:

with cte as (your basic select goes here)
select *
from cte cross join
     (select averages here
      from cte
     ) const
where < your row filter here>

According to section 7.2.4 of the doc: 根据文件第7.2.4节

When multiple window functions are used, all the window functions having syntactically equivalent PARTITION BY and ORDER BY clauses in their window definitions are guaranteed to be evaluated in a single pass over the data. 当使用多个窗口函数时,所有在窗口定义中具有语法上等效的PARTITION BY和ORDER BY子句的窗口函数都可以保证在数据的单次传递中进行评估。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM