简体   繁体   中英

Correct BigQuery SQL to get value from previous partition

I am trying to use a window function to get a value from the previous partition/window frame (rather than from some other row in the same partition). I don't fully understand how I can do this but have tried various combinations of using:

  • partition clause but with LAG() function
  • omitting partition clause and using an ORDER BY + a RANGE 1 PRECEDING specification

Neither give me the expected result though.

My example is made ever so slightly more complicated by the fact that the value belongs to the "month" rather than a row so I don't want to do any aggregation on that value. Thankfully, BigQuery has ANY_VALUE() for this.

My data looks like this:

Year Month Date MonthValue
2022 10 2022-10-30 50
2022 10 2022-10-31 50
2022 11 2022-11-01 60
2022 11 2022-11-01 60

The expected result is:

Year Month Date MonthValue PreviousMonthValue
2022 10 2022-10-30 50 NULL
2022 10 2022-10-31 50 NULL
2022 11 2022-11-01 60 50
2022 11 2022-11-01 60 50

Assuming it's not possible to access values from a different partition/window frame if you use a PARTITION BY clause, I thought the RANGE 1 PRECEDING approach had the best mileage but the challenge with this is you must ORDER BY a single column that is numeric - and my understanding is your choice of column determines what RANGE means. I need to order by "year-month" and expressing this as an integer is tricky. I tried using CAST(FORMAT_DATE("%Y%m%, Date) AS INT64) but I think the problem with this is it bases range's granularity off "Date" which is too granular. I then tried (Year * 100) + Month (which would be logically equivelant to (CONCAT(CAST(Year AS STRING), RIGHT(CONCAT("0", CAST(Month AS STRING)), 2)) AS INT64) but this didn't work either.

ANY_VALUE(MonthlyValue) OVER (ORDER BY (`Year` * 100) + `Month` RANGE 1 PRECEDING) AS PreviousMonthlyValue

I can think of an alternative way of doing this with a self-join where I join to the previous month but I want to achieve it with a window function.

Thanks in advance!

Consider below

select *,
  last_value(MonthValue) over prev_month as PreviousMonthValue
from your_table
window prev_month as (order by 12 * year + month range between 1 preceding and 1 preceding)            

if applied to sample data in your question - output is

在此处输入图像描述

Above assumes all monthValue are same within the month as it is in your sample data - but most likely it is not a case and you want the value at a last day of month - so use below in this case

with endMonthValues as (
  select Year, Month, array_agg(MonthValue order by Date desc limit 1)[offset(0)] MonthValue
  from your_table
  group by Year, Month
)
select a.*, b.MonthValue as PreviousMonthValue
from your_table a
left join endMonthValues b
on 12 * a.year + a.month - 1 = 12 * b.year + b.month

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM