简体   繁体   中英

fill in “missing” values using match_recognize

A question here, on SO , asked for resolving a rather common use case of filling in "missing" (based on defined criteria of being missing) values with non-"missing" values from "previous" (based on defined criteria of ordering) rows.

My usual approach to this problem is to

  • remap ( decode / case ) the "missing" values to NULLs,
  • use the last_value() analytic function on the non-"missing"-value, with ignore nulls and the window of all preceding rows up to the current row in the defined ordering.

Ie, given a (taken from the original post) input set of rows my_table of ...

ORD  COL1  COL2  COL3  COL4
---  ----  ----  ----  ----
  1     A     0     1     5
  2     B     0     4     0
  3     C     2     0     0
  4     D     0     0     0
  5     E     3     5     0
  6     F     0     3     0
  7     G     0     3     1
  8     A     0     1     5
  9     E     3     5     0

..., an ordering of order by ord asc , a criterion for a value being "missing" being the value is negative or zero ( case when colX <= 0 then null else colX end for X of {2,3,4}), the query of ...

select X.ord, X.col1,
    nvl(last_value(case when col2 > 0 then col2 end) ignore nulls over (order by ord), col2) as col2,
    nvl(last_value(case when col3 > 0 then col3 end) ignore nulls over (order by ord), col3) as col3,
    nvl(last_value(case when col4 > 0 then col4 end) ignore nulls over (order by ord), col4) as col4
from my_table X
order by ord;

... would yield the desired result of ...

ORD  COL1  COL2  COL3  COL4
---  ----  ----  ----  ----
  1     A     0     1     5
  2     B     0     4     5
  3     C     2     4     5
  4     D     2     4     5
  5     E     3     5     5
  6     F     3     3     5
  7     G     3     3     1
  8     A     3     1     5
  9     E     3     5     5

Lukas Eder proposed a beautiful alternative solution with the model clause, working with the fact that the model operation in his case works inductively (based on my observation of the SQL MODEL ORDERED operation in his query's execution plan), from the first row to the last row in the desired ordering, resulting in the row n having the "missing" values already filled in when the row n+1 is being processed.

One of Lukas's comments also mentioned the possibility of using the match_recognize clause of Oracle 12c. My attempts to get this (completely new to me) clause understood and working failed badly. So, ...

A bounty offer! :-)

I'm offering a small bounty to the most elegant match_recognize -based solution to the above problem. All preprocessings (views, inline views, with clauses, ...) of the data with pivot , unpivot , self-joins, model , analytics, aggregations, PL/SQL, whatever, are prohibited. Only standard scalar SQL functions are allowed. I'm interested in pure match_recognize working on the base my_table data source.

I don't think you can accomplish the result using pure MATCH_RECOGNIZE clause. Simply because the values col2 , col3 , col4 are independent on each other but the pattern is evaluated per entire row. So if multiple symbols can be matched (multiple columns have zero or non-zero values in your case) only the first one is matched as classifier and the others are ignored.

For single column there is simple solution:

SELECT
    ORD, COL1, COL2R COL2
FROM
    my_table
MATCH_RECOGNIZE (
     ORDER BY ORD
     MEASURES
        NVL(LAST(V2.COL2), 0) AS COL2R
     ALL ROWS PER MATCH
     PATTERN ((V20*V2+V20*)+)
     DEFINE
        V2  AS  V2.COL2 > 0,
        V20 AS V20.COL2 = 0
     )

Anyway, analytic functions are definitely the best option in your case. Both MODEL and MATCH_RECOGNIZE clauses are meant to solve different problems where analytic functions will not help and although there are well optimized both are more heavy weight than analytic functions.

The condition within the pattern does get evaluated per entire row, so there has to be a condition defined for every eventuality. You can then group appropriate conditions together using the SUBSET clause.

WITH t(col0, col1, col2, col3, col4) AS (
  SELECT 1, 'A', 0, 1, 5 FROM DUAL UNION ALL
  SELECT 2, 'B', 0, 4, 0 FROM DUAL UNION ALL
  SELECT 3, 'C', 2, 0, 0 FROM DUAL UNION ALL
  SELECT 4, 'D', 0, 0, 0 FROM DUAL UNION ALL
  SELECT 5, 'E', 3, 5, 0 FROM DUAL UNION ALL
  SELECT 6, 'F', 0, 3, 0 FROM DUAL UNION ALL
  SELECT 7, 'G', 0, 3, 1 FROM DUAL UNION ALL
  SELECT 8, 'A', 0, 1, 5 FROM DUAL UNION ALL
  SELECT 9, 'E', 3, 5, 0 FROM DUAL
)
SELECT col0, col1, C02, C03, C04
FROM t
match_recognize(
  order by col0
  measures nvl(C02.col2,0) C02,
           nvl(C03.col3,0) C03,
           nvl(C04.col4,0) C04
  all rows per match
  pattern ((C234|C23|C24|C34|C2|C3|C4|X)+)
  subset C02 = (C234, C23, C24, C2),
         C03 = (C234, C23, C34, C3),
         C04 = (C234, C24, C34, C4)
  define C234 as col2>0 and col3>0 and col4>0,
         C23 as col2>0 and col3>0,
         C24 as col2>0 and col4>0,
         C34 as col3>0 and col4>0,
         C2 as col2>0, C3 as col3>0, C4 as col4>0
);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM