简体   繁体   中英

big query SQL - repeatedly/recursively change a row's column in the select statement based on the values in previous row

I have table like below

customer date end date
1 jan 1 2021 jan 30 2021
1 jan 2 2021 jan 31 2021
1 jan 3 2021 feb 1 2021
1 jan 27 2021 feb 26 2021
1 feb 3 2021 mar 5 2021
2 jan 2 2021 jan 31 2021
2 jan 10 2021 feb 9 2021
2 feb 10 2021 mar 12 2021

Now, I wanted to update the value in the 'end date' column of a row based on the values in the previous row 'end date' and the current row 'date'. Say if the date in current row < end date of the previous row, I wanted to update the end date of the current row = (end date of the previous row). I Wanted to do this repeated for all the rows (grouped by customer).

I want the output as below. Just need it in the select statement instead of a updating/inserting in a table.

Note - in below as the second row(end date) is updated with the value in the first row (jan 30 2021), now the third row value (jan 3 2021) is evaluated against the updated value in the second row (which is jan 30 2021) but not with the second row value before update (jan 31 2021).

customer date end date
1 jan 1 2021 jan 30 2021
1 jan 2 2021 jan 30 2021 [updated because current date < previous end date]
1 jan 3 2021 jan 30 2021[updated because current date < previous end date]
1 jan 27 2021 jan 30 2021 [updated because current date < previous end date]
1 feb 3 2021 mar 5 2021
2 jan 2 2021 jan 31 2021
2 jan 10 2021 jan 31 2021[updated because current date < previous end date]
2 feb 10 2021 mar 12 2021

I think I should go this way. I use the datasource twice just to get the way its needed to perform the operation without updating or inserting into the table.

input table :

1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-31 
1|2021-01-03|2021-02-01
1|2021-01-27|2021-02-26
1|2021-02-03|2021-03-05
2|2021-01-02|2021-01-31
2|2021-01-10|2021-02-09
2|2021-02-10|2021-03-12

code :

with num_raw_data as (
    SELECT row_number() over(partition by customer)as num, customer,date_init,date_end  
    FROM `project-id.data-set.table`
), analyzed_data as(
    select r.num,
    r.customer,
    r.date_init,
    r.date_end,
    case when date_init<(select date_end from num_raw_data where num=r.num-1 and customer=r.customer and EXTRACT(month FROM r.date_init)=EXTRACT(month FROM date_init)) then 1 else 0 end validation
    from num_raw_data r 
)
select customer,
date_init,
case when validation !=0 then (select MIN(date_end) from analyzed_data where validation=0 and customer=ad.customer and date_init<ad.date_end) else date_end end as date_end
from analyzed_data ad
order by customer,num

output :

1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-30
1|2021-01-03|2021-01-30 
1|2021-01-27|2021-01-30 
1|2021-02-03|2021-03-05 
2|2021-01-02|2021-01-31
2|2021-01-10|2021-01-31 
2|2021-02-10|2021-03-12

Using column validation from analyzed_data to get to know where I should be looking for changes. I'm not sure if its fast (probably not) but it works for the scenario you bring in your question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM