简体   繁体   中英

Using postgres window functions with different order by clause

I have an issue using multiple order by in postgres window functions. Here is short example. Select total number of rows, N first and N last rows from in a single query. (It is not the task that I want to achieve just an example of the issue) Is it expected behavior or a bug in postgres? I'm using postgres 9.6

select generate_series(1, 10) id
into q;

select
       count(*) over (),
       lag(id , 0) over (order by id asc) a,
       lag(id , 0) over (order by id desc) d
from q
limit 5;

Output:

10,10,10
10,9,9
10,8,8
10,7,7
10,6,6

Expected:

10,1,10
10,2,9
10,3,8
10,4,7
10,5,6

Code work good if only N first or only N last rows are selected.

I once had a similar problem which had the same explanation: https://stackoverflow.com/a/48668220/3984221


Explanation of the behaviour:

demo:db<>fiddle

You can explain this when you have a look into the EXPLAIN output:

> | QUERY PLAN                                                                                                                   |
> | :--------------------------------------------------------------------------------------------------------------------------- |
> | WindowAgg  (cost=368.69..445.19 rows=2550 width=20) (actual time=0.146..0.150 rows=10 loops=1)                               |
> |   -&gt;  WindowAgg  (cost=368.69..413.32 rows=2550 width=12) (actual time=0.128..0.136 rows=10 loops=1)                         |
> |         -&gt;  Sort  (cost=368.69..375.07 rows=2550 width=8) (actual time=0.126..0.128 rows=10 loops=1)                         |
> |               Sort Key: id                                                                                                   |
> |               Sort Method: quicksort  Memory: 25kB                                                                           |
> |               -&gt;  WindowAgg  (cost=179.78..224.41 rows=2550 width=8) (actual time=0.048..0.056 rows=10 loops=1)              |
> |                     -&gt;  Sort  (cost=179.78..186.16 rows=2550 width=4) (actual time=0.033..0.034 rows=10 loops=1)             |
> |                           Sort Key: id DESC                                                                                  |
> |                           Sort Method: quicksort  Memory: 25kB                                                               |
> |                           -&gt;  Seq Scan on q  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.013..0.014 rows=10 loops=1) |
> | Planning Time: 0.292 ms                                                                                                      |
> | Execution Time: 0.445 ms                                                                                                     |

Here you can see: First there is a SORT Key: id DESC . So everything is ordered in DESC order. If you have only the DESC ordered function, this would be your result, as you already saw. Now, you have a second window function. So, the entire result will be sorted a second time, into the ASC order, incl. your first result. So, your first lag() result 10, 9, 8, 7, 6, ... will be ordered back into 1, 2, 3, 4, 5, ... Afterwards the second lag() result will be added.

However, your specific result for your lag() function is explainable, of course: You don't shift your data, so you get the current value. You can cross check this (as I did in the fiddle above), when you turn your 0 shift value into 1 . Then your DESC lag() will return 2 for id 1 , but ASC gives NULL . Everything's fine.

So, to create your expected output, you need another approach, eg using row_number() to add the row count in ASC and DESC order and filter them afterwards:

demo:db<>fiddle

SELECT
    COUNT(*) OVER (),
    a.id,
    d.id
FROM ( 
   select
       id,
       row_number() over (order by id asc)
   from q
) a
JOIN ( 
   select
       id,
       row_number() over (order by id desc)
   from q
) d ON a.row_number = d.row_number
LIMIT 5

The second parameter of lag() determines how many rows to look back. So lag(id, 0) means to look back zero rows, which makes lag(id, 0) equivalent of just id . So the result you get is perfectly sane.

Do get what you want, you can use row_number() to join on.

SELECT count(*) OVER (),
       x1.id,
       x2.id
       FROM (SELECT id,
                    row_number() OVER (ORDER BY id ASC) r
                    FROM q) x1
            INNER JOIN (SELECT id,
                               row_number() OVER (ORDER BY id DESC) r
                               FROM q) x2
                       ON x2.r = x1.r
      ORDER BY x1.r
      LIMIT 5;

Order from one of the over clause are applied to the data. But data only sorted once. The desired behavior could be achieved by following query.

select count(*) over (),
       (array_agg(id) over (order by id asc rows between unbounded preceding and unbounded  following))[row_number() over ()] a,
       (array_agg(id) over (order by id desc rows between unbounded preceding and unbounded  following))[row_number() over ()] d
from q
order by id
limit 5

maybe memory ineffective for large tables because array_agg construct array from all rows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM