简体   繁体   English

使用和不使用 ORDER BY 子句的分区分析计数

[英]Analytic count over partition with and without ORDER BY clause

I don't understand why there are different results when using an ORDER BY clause in an analytic COUNT function.我不明白为什么在解析COUNT function 中使用ORDER BY子句时会出现不同的结果。

Using a simple example:使用一个简单的例子:

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls) as cnt from req;

gives the the following result:给出以下结果:

N   CLS CNT
2   A   2
1   A   2

Whereas, when adding an ORDER BY in the analytic clause, the result is different!然而,在解析子句中添加ORDER BY时,结果不同!

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls order by n) as cnt from req;

CNT column changed: CNT 列更改:

N   CLS CNT
1   A   1
2   A   2

Can someone explain please?有人可以解释一下吗?

Thanks谢谢

First, a link to docs. 首先, 链接到文档。 It's somewhat obscure, however. 然而,这有点模糊。

Analytic clause consists of query_partition_clause , order_by_clause and windowing_clause . Analytic子句由query_partition_clauseorder_by_clausewindowing_clause And, a really important thing about windowing_clause is 而且,关于windowing_clause一个非常重要的事情是

You cannot specify this clause unless you have specified the order_by_clause . 除非已指定order_by_clause否则不能指定此子句。 Some window boundaries defined by the RANGE clause let you specify only one expression in the order_by_clause . RANGE子句定义的某些窗口边界允许您在order_by_clause仅指定一个表达式。 Refer to "Restrictions on the ORDER BY Clause". 请参阅“ORDER BY子句的限制”。

But not only can you not use windowing_clause without the order_by_clause , they are tied together. 但是,如果没有order_by_clause ,你不仅可以使用windowing_clause ,而且它们是捆绑在一起的。

If you omit the windowing_clause entirely, then the default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW . 如果完全省略windowing_clause,那么默认值是RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

The default windowing clause produces something like running total. 默认的窗口子句产生类似于运行总计的内容。 COUNT returns 1 for first row, as there is only one row between the top of the window and the current row, 2 for the second row and so on. COUNT返回1为第一行,因为只有一个窗口的顶部和当前行,行间2的第二行等等。

So in your first query there is no windowing at all, but there is the default windowing in the second one. 因此,在您的第一个查询中根本没有窗口,但在第二个查询中有默认窗口。

And you can simulate the behavior of the first query by specifying fully unbounded window. 您可以通过指定完全无界窗口来模拟第一个查询的行为。

with req as
 (select 1 as n, 'A' as cls
    from dual
  union
  select 2 as n, 'A' as cls
    from dual)
select req.*, count(*) over(partition by cls order by n RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as cnt from req;

Yep 是的

N   CLS CNT
1   A   2
2   A   2

The easiest way to think about this - leaving the ORDER BY out is equivalent to "ordering" in a way that all rows in the partition are "equal" to each other. 考虑这一点的最简单方法 - 将ORDER BY排除在外,等同于“排序”,使得分区中的所有行彼此“相等”。 Indeed, you can get the same effect by explicitly adding the ORDER BY clause like this: ORDER BY 0 (or "order by" any constant expression), or even, more emphatically, ORDER BY NULL . 实际上,通过显式添加ORDER BY子句可以获得相同的效果: ORDER BY 0 (或“order by” 任何常量表达式),甚至更重要的是ORDER BY NULL

Why you get the COUNT() or SUM() etc. for the entire partition has to do with the default windowing clause: RANGE between unbounded preceding and current row . 为什么获得整个分区的COUNT()SUM()等与默认的窗口子句有关: RANGE between unbounded preceding and current row "Range" (as opposed to "ROWS") means all rows "tied" with the current row are also included, even if they don't precede it. “范围”(与“ROWS”相对)意味着所有与当前行“绑定”的行也包括在内,即使它们不在它之前。 Since all rows are tied, this means the entire partition is included, no matter which row is "current." 由于所有行都是绑定的,这意味着包括整个分区,无论哪一行是“当前的”。

Window functions will perform the aggregation over partition by (split by) value, when you omit ORDER BY clause the result will be similar to GROUP BY with output of each row. Window 函数将对按(拆分)值的分区执行聚合,当您省略ORDER BY子句时,结果将类似于每行 output 的GROUP BY It is also possible to omit PARTITION BY , in which case there is just one partition containing all the rows也可以省略PARTITION BY ,在这种情况下只有一个分区包含所有行

When you add ORDER BY clause to window function then it will perform the calculation in subsequent order within the same partition and start over with a different partition (group of values)当您将ORDER BY子句添加到 window function 时,它将在同一分区内按后续顺序执行计算,并从不同的分区(值组)重新开始

Values that are not distinct in the ORDER BY ordering are said to be peers, in COUNT() they will have the same calculated result of its last peer that will create gaps that maintain the totalORDER BY排序中不明显的值被称为对等值,在COUNT()中它们将具有与其最后一个对等值相同的计算结果,这将产生间隙以维持总数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM