简体   繁体   中英

Oracle SQL - How to calculate null values based on previous non-null record

I have some columns in a table like this:

id | date      | change | end_value 
 1 | 03-JAN-20 | -9     |       
 2 | 04-JAN-20 | 12     |           
 3 | 05-JAN-20 | -43    | 523       
 4 | 06-JAN-20 | 0      |           
 5 | 07-JAN-20 | 5      |           
 6 | 08-JAN-20 | 10     |           
 7 | 09-JAN-20 | 3      | 505       
 8 | 10-JAN-20 | 4      |           
 9 | 11-JAN-20 | -3     |           
 10| 12-JAN-20 | 1      | 503       
 11| 13-JAN-20 | -6     |           

I need to fill in all the null values in the end_value column based on the previous non-null value and minus the sum of change. When the end_value is not null, keep the value as it is.

The result would be something like this:

id | date      | change | end_value | result
 1 | 03-JAN-20 | -9     |           | 492 (=523 - 43 + 12)
 2 | 04-JAN-20 | 12     |           | 480 (=523 - 43)
 3 | 05-JAN-20 | -43    | 523       | 523 
 4 | 06-JAN-20 | 0      |           | 523 (=523 - 0)
 5 | 07-JAN-20 | 5      |           | 518 (=523 - 0 - 5)
 6 | 08-JAN-20 | 10     |           | 508 (=523 - 0 - 5 - 10)
 7 | 09-JAN-20 | 3      | 505       | 505 
 8 | 10-JAN-20 | 4      |           | 501 (=505 - 4)
 9 | 11-JAN-20 | -3     |           | 504 (=505 - 4 + 3)
 10| 12-JAN-20 | 1      | 503       | 503
 11| 13-JAN-20 | -6     |           | 509 (=503 + 6)

I figured might need to use last_value ignore null function, but can't figure out the running minues part.

Thanks for the help!

The solution below depends on the first non null value for end_value sorted by date - ie, it ignores the rest of the values.

with t (sid, dt,change,end_value) as ( 
 select 1 , to_date('03-JAN-20', 'dd-MON-rr') , -9     , null    from dual union all   
 select 2 , to_date('04-JAN-20', 'dd-MON-rr') , 12     , null    from dual union all       
 select 3 , to_date('05-JAN-20', 'dd-MON-rr') , -43    , 523     from dual union all       
 select 4 , to_date('06-JAN-20', 'dd-MON-rr') , 0      , null    from dual union all       
 select 5 , to_date('07-JAN-20', 'dd-MON-rr') , 5      , null    from dual union all       
 select 6 , to_date('08-JAN-20', 'dd-MON-rr') , 10     , null    from dual union all       
 select 7 , to_date('09-JAN-20', 'dd-MON-rr') , 3      , 505     from dual union all       
 select 8 , to_date('10-JAN-20', 'dd-MON-rr') , 4      , null    from dual union all       
 select 9 , to_date('11-JAN-20', 'dd-MON-rr') , -3     , null    from dual union all       
 select 10, to_date('12-JAN-20', 'dd-MON-rr') , 1      , 503     from dual union all       
 select 11, to_date('13-JAN-20', 'dd-MON-rr') , -6     , null    from dual 
 )
 select sid, dt, change, end_value, nvl(yy,yyy) rslt from (
   select a.* 
   , sum(case when dt = xx then end_value when dt > xx then -change end) over ( order by dt) yy
   , sum(case when dt = xx then end_value when dt < xx then ld end) over ( order by dt desc) yyy
   from (
     select t.*
     , min(dt) keep (dense_rank first order by nvl2(end_value,0,1)) over () xx
     , lead(change) over (order by dt) ld
     from t
   ) a
 ) b
 order by dt

This is a type of gaps-and-islands problem. The solution is actually pretty simple:

  • Define the islands by counting the number of non-NULL end_value s on or after each row.
  • Within each group, do a cumulative sum of change and add to the end_value for that group.

There is a little trick because you don't want the change for the current row. That is easily handled by subtracting it out of the cumulative sum:

select t.*,
       (max(end_value) over (partition by grp order by dt desc) +
        sum(change) over (partition by grp order by dt desc) -
        change
      ) as new_end_value
from (select t.*, count(end_value) over (order by dt desc) as grp
      from t
     ) t
order by dt;

Here is a db<>fiddle.

If you want to update the value, use merge :

merge into t using
      (select t.*,
              (max(end_value) over (partition by grp order by dt desc) +
               sum(change) over (partition by grp order by dt desc) -
               change
             ) as new_end_value
       from (select t.*, count(end_value) over (order by dt desc) as grp
             from t
            ) t
      ) src
      on (src.sid = t.sid)
when matched then update
    set end_value = src.new_end_value;

You can use a PL/SQL cursor and store the running sum of CHANGE in a variable. Something like this:

DECLARE
    CURSOR cur IS
    SELECT
        id,
        change,
        end_value
    FROM
        test
    ORDER BY
        "DATE";

    TYPE t_record IS RECORD (
        id          NUMBER,
        change      NUMBER,
        end_value   NUMBER
    );
    v_record     t_record;
    v_baseline   NUMBER := 0;
    v_change     NUMBER := 0;
BEGIN
    FOR row IN cur LOOP
        IF row.end_value IS NOT NULL THEN
            v_baseline := row.end_value;
            v_change := 0;
        ELSE
            v_change := v_change + row.change;
            UPDATE test
            SET
                end_value = v_baseline - v_change
            WHERE
                id = row.id;
            -- COMMIT;
        END IF;
    END LOOP;
END;
/

Notice that you wrote that the column should be filled "based on the previous non-null value", but in your example the first two rows are filled based on the next non-null value (if I understood correctly), so this code doesn't work for them. Anyway, you can adjust it according to your needs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM