简体   繁体   中英

SQL - UDF to determine index event in a sequence with condition

I have a table of subject records where each subject has 1 or more event dates

SUBJECT EVENT_DATE
1 2020-01-01
1 2020-11-06
2 2021-09-24
2 2021-09-26
3 2022-04-01
4 2020-05-01
4 2021-05-25
4 2021-05-31
4 2022-11-31
4 2022-12-02

For each subject and event, I need to determine the first event ("index date") within a 30-day period. So if the subject has a single event or more than 1 events separated by more than 30 days, then the index date would just be the corresponding event date. However, if there are 2 more events within 30 days then the index date would be the first event in the sequence. Note, that an event date must not have an index date more than 30 days prior. So I tried to create a chain of events and use the lag function, but I need to compare to the tests prior as well if there are more than 2. For subject 4, they should have an index date like:

SUBJECT EVENT_DATE INDEX_DATE
4 2020-05-01 2022-05-01
4 2021-05-25 2022-05-25
4 2021-05-31 2022-05-25
4 2022-11-31 2022-11-31
4 2022-12-02 2022-11-31

This does need a UDTF, but unfortunately it's a bit more complex than carrying a state. It requires a row buffer. Since it has a row buffer, it runs the risk of running out of memory if the number of rows between 30 days is too high.

Here's the code for the UDTF and its usage:

create or replace transient table T1 as 
select 
COLUMN1::int as "SUBJECT",
COLUMN2::date as "EVENT_DATE"
from (values
(1,'2020-01-01'),
(1,'2020-11-06'),
(2,'2021-09-24'),
(2,'2021-09-26'),
(3,'2022-04-01'),
(4,'2020-05-01'),
(4,'2021-05-25'),
(4,'2021-05-31'),
(4,'2022-11-30'), -- Changed from 2022-11-31
(4,'2022-12-02')
);

create or replace function "FIRST_DATE"(ROW_DATE date, DAYS_TO_LIVE float)
returns table (FIRST_DATE date)
language javascript
as
$$
{
    initialize: function (argumentInfo, context) {
        this.buffer = [];
    },
    processRow: function (row, rowWriter, context) {
        const DAY = 86400000; // Milliseconds per day.
        var shifted;
        var rowDate = new Date(row.ROW_DATE);
        this.buffer.push(rowDate);
        ttlDate = new Date(rowDate.getTime() - DAY * row.DAYS_TO_LIVE);
        do {
            if (this.buffer[0] < ttlDate) {
                this.buffer.shift();
                shifted = true;
            } else {
                shifted = false;
            }
        } while (shifted)
        rowWriter.writeRow({FIRST_DATE:this.buffer[0]});
    },
}
$$;

select *
from T1, table(FIRST_DATE(EVENT_DATE, 30::float) over (partition by SUBJECT order by EVENT_DATE))
order by SUBJECT, EVENT_DATE
;
SUBJECT EVENT_DATE FIRST_DATE
1 2020-01-01 2020-01-01
1 2020-11-06 2020-11-06
2 2021-09-24 2021-09-24
2 2021-09-26 2021-09-24
3 2022-04-01 2022-04-01
4 2020-05-01 2020-05-01
4 2021-05-25 2021-05-25
4 2021-05-31 2021-05-25
4 2022-11-30 2022-11-30
4 2022-12-02 2022-11-30

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM