简体   繁体   中英

Trying to understand over() and partition by

I am trying to get the over and partition by functionality wrapped around my head. Here is an example that I just do not understand.

Here is the data I have:

SALESORDERID       ORDERDATE 
43894              08/01/2001 
43664              07/01/2001 
43911              08/01/2001 
43867              08/01/2001 
43877              08/01/2001 
44285              10/01/2001 
44501              11/01/2001 
43866              08/01/2001 
43895              08/01/2001 
43860              08/01/2001

When I run this query:

select Row_Number() over(partition by orderdate order by orderdate asc) 
    as Rownumber, salesorderid, orderdate
from test2
order by rownumber

Here are the results I get:

ROWNUMBER     SALESORDERID       ORDERDATE 
1             43664              07/01/2001 
1             43911              08/01/2001 
1             44109              09/01/2001 
1             44483              11/01/2001 
1             44285              10/01/2001 
2             43867              08/01/2001 
2             44501              11/01/2001 
3             43895              08/01/2001 
4             43894              08/01/2001 
5             43877              08/01/2001 

Can someone explain this query to me. I am not new to SQL but windowing I have been struggling with and can't get my head wrapped around this.

Try ordering by order date, you'll see the results more easily

select Row_Number() over(partition by orderdate order by orderdate asc) 
    as Rownumber, salesorderid, orderdate
from test2
order by orderdate;

should give (i've added blank lines for clarity)

ROWNUMBER     SALESORDERID       ORDERDATE
1             43664              07/01/2001

1             43911              08/01/2001
2             43867              08/01/2001
3             43895              08/01/2001
4             43894              08/01/2001
5             43877              08/01/2001

1             44109              09/01/2001

1             44285              10/01/2001

1             44483              11/01/2001
2             44501              11/01/2001

You'll notice that the result is divided into 'partitions', each partition being the set of rows with identical orderdates. That is what 'partition by orderdate' means.

Within a partition, the rows are ordered by orderdate, as per the second clause of '(partition by orderdate order by orderdate asc)'. That isn't very useful, as all rows within a partition are going to have the same orderdate. Because of that, the ordering of the rows within a partition is random. Try ordering by salesorderid within the partition by clause to have a more reproducable result.

row_number() just returns the row's ordering within each partition

The partition by orderdate means that you're only comparing records to other records with the same orderdate . For example, of the five records with orderdate = '08/01/2001' , one will have row_number() = 1 , one will have row_number() = 2 , and so on.

The order by orderdate asc means that, within a partition, row-numbers are to be assigned in order of orderdate . In your example that has no effect, because you're already partitioning by orderdate , so all records within a partition will have the same orderdate . (It would be like writing SELECT ... FROM t WHERE c = 6 ORDER BY c : all selected records have the same value of c , so the ORDER BY c does nothing.) So, within a partition, the assignment of row_number() is arbitrary: each row will have a different number, but there are no guarantees about which row will have which number.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM