There are many ways to get a lagged value of a certain column in SQL, eg:
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
,or:
select variable_of_interest
,lag(variable_of_interest ,1)
over(partition by
some_group order by variable_1,...,variable_n)
as lag_variable_of_interest
from DATA
I use the second version, but my code runs very slow when "lagging" many variables, such that my code becomes:
select variable_of_interest_1
,variable_of_interest_2
,variable_of_interest_3
,lag(variable_of_interest_1 ,1)
over(partition by
some_group order by variable_1,...,variable_n)
as lag_variable_of_interest_1
,lag(variable_of_interest_2 ,1)
over(partition by
some_group order by variable_1,...,variable_n)
as lag_variable_of_interest_2
,lag(variable_of_interest_3 ,1)
over(partition by
some_group order by variable_1,...,variable_n)
as lag_variable_of_interest_3
from DATA
I wonder, is this because each lag function must by its own partition and order the whole data set, even though the are using the same partition and order?
I am not 100% sure about how DB2 optimizes such queries. If it executes each lag independently, then there is definitely room to improve the optimizer.
One method you could use is lag()
with a join
on the primary key :
select t.*, tprev.*
from (select t.*, lag(id) over ( . . . ) as prev_id
from t
) t left join
t tprev
on t.id = tprev.prev_id ;
From what you describe, this might be the most efficient method to do what you want.
This should be more efficient than row_number()
because the join can make use of an index.
Db2 will only sort the data once, if all OLAP functions use the same PARTITION BY
and ORDER BY
. You can confirm this by looking at an explain plan.
create table data(v1 int, v2 int, v3 int, g1 int, g2 int, o1 int, o2 int) organize by row
;
explain plan for
select g1
, g2
, o1
, o2
, v1
, v2
, v3
, lag(v1) over(partition by g1, g2 order by o1, o2 ) as lag_v1
, lag(v2) over(partition by g1, g2 order by o1, o2 ) as lag_v2
, lag(v3) over(partition by g1, g2 order by o1, o2 ) as lag_v3
from
data
;
will give the following plan (using db2exfmt -1 -d $DATABASE
). You can see there is only one SORT
operator
Access Plan:
-----------
Total Cost: 14.839
Query Degree: 4
Rows
RETURN
( 1)
Cost
I/O
|
1000
LMTQ
( 2)
14.839
2
|
1000
TBSCAN
( 3)
14.5555
2
|
1000
SORT
( 4)
14.5554
2
|
1000
TBSCAN
( 5)
14.2588
2
|
1000
TABLE: PAUL
DATA
Q1
BTW If you post a question with a real SQL query (along with some DDL and some idea of the data volumes), we might be able to suggest things that could improve the performance of getting lagged values. It is difficult to advise in detail without seeing a better example
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.