简体   繁体   中英

Impute all null values with most frequent values of corresponding columns in oracle sql with single update statement

I'm trying to impute all the null values present in a oracle database table. Suppose the table contains the following rows

ID    Col1    Col2
-------------------
1     Male    USA
2     Male    USA
3     Female  Russia
4     (null)  USA
5     Male    (null)
6     Male    USA
7     Female  USA
8     (null)  Canada
9     Male    USA

Now, we can see "Male" is the most frequent value in Col1 and "USA" is the most frequent value in Col2. I want all the null values in Col1 to be replaced by "Male" and all the null values in Col2 to be replace by "USA". In case of a tie any value can be used to replace.

So, the final table will look like this.

ID    Col1    Col2
-------------------
1     Male    USA
2     Male    USA
3     Female  Russia
4     Male    USA
5     Male    USA
6     Male    USA
7     Female  USA
8     Male    Canada
9     Male    USA

So far what I've done is this.

UPDATE tablename
    SET
        col1 = (
            SELECT
                col1
            FROM
                tablename
            GROUP BY
                col1
            ORDER BY
                COUNT(*) DESC
            FETCH FIRST 1 ROWS ONLY
        )
    WHERE 
        col1 IS NULL;


UPDATE tablename
    SET
        col2 = (
            SELECT
                col2
            FROM
                tablename
            GROUP BY
                col2
            ORDER BY
                COUNT(*) DESC
            FETCH FIRST 1 ROWS ONLY
        )
    WHERE 
        col2 IS NULL;

What I've done here is finding most frequent value for every column and update it. Obviously this works fine for a table with only 2 columns. But if I have a table with more than 20 columns this process becomes messy. Is there a better way to do this?

Compute stats_mode for each column in a separate query, and nvl over a cross join. Like this:

with
  inputs (id, col1, col2) as (
    select 1, 'Male'  , 'USA'    from dual union all
    select 2, 'Male'  , 'USA'    from dual union all
    select 3, 'Female', 'Russia' from dual union all
    select 4, null    , 'USA'    from dual union all
    select 5, 'Male'  , (null)   from dual union all
    select 6, 'Male'  , 'USA'    from dual union all
    select 7, 'Female', 'USA'    from dual union all
    select 8, null    , 'Canada' from dual union all
    select 9, 'Male'  , 'USA'    from dual
  )
select i.id,
       nvl(i.col1, m.col1_mode) as col1, 
       nvl(i.col2, m.col2_mode) as col2
from   inputs i cross join
       (select stats_mode(col1) as col1_mode,
               stats_mode(col2) as col2_mode from inputs) m
;

        ID COL1   COL2  
---------- ------ ------
         1 Male   USA   
         2 Male   USA   
         3 Female Russia
         4 Male   USA   
         5 Male   USA   
         6 Male   USA   
         7 Female USA   
         8 Male   Canada
         9 Male   USA   

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM