I have table like below
customer | date | end date |
---|---|---|
1 | jan 1 2021 | jan 30 2021 |
1 | jan 2 2021 | jan 31 2021 |
1 | jan 3 2021 | feb 1 2021 |
1 | jan 27 2021 | feb 26 2021 |
1 | feb 3 2021 | mar 5 2021 |
2 | jan 2 2021 | jan 31 2021 |
2 | jan 10 2021 | feb 9 2021 |
2 | feb 10 2021 | mar 12 2021 |
Now, I wanted to update the value in the 'end date' column of a row based on the values in the previous row 'end date' and the current row 'date'. Say if the date in current row < end date of the previous row, I wanted to update the end date of the current row = (end date of the previous row). I Wanted to do this repeated for all the rows (grouped by customer).
I want the output as below. Just need it in the select statement instead of a updating/inserting in a table.
Note - in below as the second row(end date) is updated with the value in the first row (jan 30 2021), now the third row value (jan 3 2021) is evaluated against the updated value in the second row (which is jan 30 2021) but not with the second row value before update (jan 31 2021).
customer | date | end date |
---|---|---|
1 | jan 1 2021 | jan 30 2021 |
1 | jan 2 2021 | jan 30 2021 [updated because current date < previous end date] |
1 | jan 3 2021 | jan 30 2021[updated because current date < previous end date] |
1 | jan 27 2021 | jan 30 2021 [updated because current date < previous end date] |
1 | feb 3 2021 | mar 5 2021 |
2 | jan 2 2021 | jan 31 2021 |
2 | jan 10 2021 | jan 31 2021[updated because current date < previous end date] |
2 | feb 10 2021 | mar 12 2021 |
I think I should go this way. I use the datasource twice just to get the way its needed to perform the operation without updating or inserting into the table.
input table :
1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-31
1|2021-01-03|2021-02-01
1|2021-01-27|2021-02-26
1|2021-02-03|2021-03-05
2|2021-01-02|2021-01-31
2|2021-01-10|2021-02-09
2|2021-02-10|2021-03-12
code :
with num_raw_data as (
SELECT row_number() over(partition by customer)as num, customer,date_init,date_end
FROM `project-id.data-set.table`
), analyzed_data as(
select r.num,
r.customer,
r.date_init,
r.date_end,
case when date_init<(select date_end from num_raw_data where num=r.num-1 and customer=r.customer and EXTRACT(month FROM r.date_init)=EXTRACT(month FROM date_init)) then 1 else 0 end validation
from num_raw_data r
)
select customer,
date_init,
case when validation !=0 then (select MIN(date_end) from analyzed_data where validation=0 and customer=ad.customer and date_init<ad.date_end) else date_end end as date_end
from analyzed_data ad
order by customer,num
output :
1|2021-01-01|2021-01-30
1|2021-01-02|2021-01-30
1|2021-01-03|2021-01-30
1|2021-01-27|2021-01-30
1|2021-02-03|2021-03-05
2|2021-01-02|2021-01-31
2|2021-01-10|2021-01-31
2|2021-02-10|2021-03-12
Using column validation
from analyzed_data
to get to know where I should be looking for changes. I'm not sure if its fast (probably not) but it works for the scenario you bring in your question.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.