简体   繁体   English

使用单个更新语句将所有 null 值与 oracle sql 中相应列的最常见值进行估算

[英]Impute all null values with most frequent values of corresponding columns in oracle sql with single update statement

I'm trying to impute all the null values present in a oracle database table.我正在尝试估算 oracle 数据库表中存在的所有 null 值。 Suppose the table contains the following rows假设表包含以下行

ID    Col1    Col2
-------------------
1     Male    USA
2     Male    USA
3     Female  Russia
4     (null)  USA
5     Male    (null)
6     Male    USA
7     Female  USA
8     (null)  Canada
9     Male    USA

Now, we can see "Male" is the most frequent value in Col1 and "USA" is the most frequent value in Col2.现在,我们可以看到“Male”是 Col1 中最常见的值,“USA”是 Col2 中最常见的值。 I want all the null values in Col1 to be replaced by "Male" and all the null values in Col2 to be replace by "USA".我希望将 Col1 中的所有 null 值替换为“男性”,并将 Col2 中的所有 null 值替换为“美国”。 In case of a tie any value can be used to replace.在平局的情况下,任何值都可以用来替换。

So, the final table will look like this.因此,决赛桌将如下所示。

ID    Col1    Col2
-------------------
1     Male    USA
2     Male    USA
3     Female  Russia
4     Male    USA
5     Male    USA
6     Male    USA
7     Female  USA
8     Male    Canada
9     Male    USA

So far what I've done is this.到目前为止,我所做的就是这样。

UPDATE tablename
    SET
        col1 = (
            SELECT
                col1
            FROM
                tablename
            GROUP BY
                col1
            ORDER BY
                COUNT(*) DESC
            FETCH FIRST 1 ROWS ONLY
        )
    WHERE 
        col1 IS NULL;


UPDATE tablename
    SET
        col2 = (
            SELECT
                col2
            FROM
                tablename
            GROUP BY
                col2
            ORDER BY
                COUNT(*) DESC
            FETCH FIRST 1 ROWS ONLY
        )
    WHERE 
        col2 IS NULL;

What I've done here is finding most frequent value for every column and update it.我在这里所做的是为每一列找到最常见的值并更新它。 Obviously this works fine for a table with only 2 columns.显然,这适用于只有 2 列的表。 But if I have a table with more than 20 columns this process becomes messy.但是如果我有一个超过 20 列的表,这个过程就会变得混乱。 Is there a better way to do this?有一个更好的方法吗?

Compute stats_mode for each column in a separate query, and nvl over a cross join.为单独查询中的每一列计算stats_mode ,并通过交叉连接计算nvl Like this:像这样:

with
  inputs (id, col1, col2) as (
    select 1, 'Male'  , 'USA'    from dual union all
    select 2, 'Male'  , 'USA'    from dual union all
    select 3, 'Female', 'Russia' from dual union all
    select 4, null    , 'USA'    from dual union all
    select 5, 'Male'  , (null)   from dual union all
    select 6, 'Male'  , 'USA'    from dual union all
    select 7, 'Female', 'USA'    from dual union all
    select 8, null    , 'Canada' from dual union all
    select 9, 'Male'  , 'USA'    from dual
  )
select i.id,
       nvl(i.col1, m.col1_mode) as col1, 
       nvl(i.col2, m.col2_mode) as col2
from   inputs i cross join
       (select stats_mode(col1) as col1_mode,
               stats_mode(col2) as col2_mode from inputs) m
;

        ID COL1   COL2  
---------- ------ ------
         1 Male   USA   
         2 Male   USA   
         3 Female Russia
         4 Male   USA   
         5 Male   USA   
         6 Male   USA   
         7 Female USA   
         8 Male   Canada
         9 Male   USA   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM