简体   繁体   English

计算每行的不同值的数量(SQL)

[英]Count the number of distinct values of each row (SQL)

How can I create a new column that returns the number of distinct values in each row inside my table? 如何创建一个新列,返回表中每行中不同值的数量? For instance, 例如,

ID   Description   Pay1    Pay2   Pay3    #UniquePays     
1    asdf1         10      20     10      2
2    asdf2         0       10     20      3
3    asdf3         100     100    100     1
4    asdf4                 0      10      3

The query may return >1million rows so it needs to be somewhat efficient. 查询可能返回> 1百万行,因此需要有点效率。 There are 8 'Pay' columns in total, which are either NULL or an integer. 总共有8个'Pay'列,它们是NULL或整数。 Also note that '0' should be counted distinct from NULL. 另请注意,'0'应与NULL不同。

The most I've been able to accomplish thus far (which I just realized isn't even accurate) is counting the total number of Pay entries in each row: 到目前为止,我能够完成的最多(我刚刚意识到这一点甚至不准确)是计算每行中支付条目的总数:

nvl(length(length(Pay1)),0)
+nvl(length(length(Pay2)),0)
+nvl(length(length(Pay3)),0) "NumPays"

The typical row only has 4 of the 8 columns populated, with the rest being null, and the max integer in the Pay column is '999' (hence the length-length conversion attempt..) 典型的行只填充了8列中的4列,其余为空,Pay列中的最大整数为'999'(因此长度转换尝试...)

My SQL skills are primitive but any help is appreciated! 我的SQL技能是原始的,但任何帮助表示赞赏!

If you have, or can create, a user-defined table of numbers, you could use create a collection, use the set function to get rid of duplicates, and then use the cardinality function to count the remaining values: 如果您拥有或可以创建用户定义的数字表,则可以使用创建集合,使用set函数去除重复项,然后使用cardinality函数计算剩余值:

cardinality(set(t_num(pay1, pay2, pay3))) as uniquepays

To include all eight of your columns, just add the extra column names to list passed to the tnum() constructor. 要包含所有八个列,只需将额外的列名添加到传递给tnum()构造函数的列表中。

cardinality(set(t_num(pay1, pay2, pay3, pay4, pay5, pay6, pay7, pay8))) as uniquepays

Demo with your sample table generated as a CTE: 将您的样本表演示为CTE:

create type t_num as table of number
/

with t (ID, Description, Pay1, Pay2, Pay3) as (
  select 1, 'asdf1', 10, 20, 10 from dual
  union all select 2, 'asdf2', 0, 10, 20 from dual
  union all select 3, 'asdf3', 100, 100, 100 from dual
  union all select 4, 'asdf4', null, 0, 10 from dual
)
select id, description, pay1, pay2, pay3,
  cardinality(set(t_num(pay1, pay2, pay3))) as uniquepays
from t
order by id;

        ID DESCR       PAY1       PAY2       PAY3 UNIQUEPAYS
---------- ----- ---------- ---------- ---------- ----------
         1 asdf1         10         20         10          2
         2 asdf2          0         10         20          3
         3 asdf3        100        100        100          1
         4 asdf4                     0         10          3

Whether that is efficient enough with millions of rows will need to be tested. 是否有足够的效率以及数百万行需要进行测试。

Here is one relatively simple way: 这是一个相对简单的方法:

CREATE TYPE number_list AS TABLE OF NUMBER;

with t (ID, Description, Pay1, Pay2, Pay3) as (
  select 1, 'asdf1', 10, 20, 10 from dual
  union all select 2, 'asdf2', 0, 10, 20 from dual
  union all select 3, 'asdf3', 100, 100, 100 from dual
  union all select 4, 'asdf4', null, 0, 10 from dual
)
SELECT id,
       description,
       pay1,
       pay2,
       pay3,
       (SELECT COUNT (DISTINCT NVL (TO_CHAR (COLUMN_VALUE), '#NULL#')) 
        FROM TABLE (number_list (pay1, pay2, pay3))) uniquepays
FROM   t;

        ID DESCR       PAY1       PAY2       PAY3 UNIQUEPAYS
---------- ----- ---------- ---------- ---------- ----------
         1 asdf1         10         20         10          2
         2 asdf2          0         10         20          3
         3 asdf3        100        100        100          1
         4 asdf4                     0         10          3

Split out each value into its own row (like it should have been stored in the first place), then union then up and (since union discards duplicates) just count the rows: 将每个值拆分为它自己的行(就像它应该存储在第一个位置),然后union然后up(因为union丢弃重复项)只计算行数:

select id, description, count(*) unique_pays from (
    select id, description, nvl(pay1, -1) from mytable
    union select id, description, nvl(pay2, -1) from mytable
    union select id, description, nvl(pay3, -1) from mytable
    union select id, description, nvl(pay4, -1) from mytable
    union select id, description, nvl(pay5, -1) from mytable
    union select id, description, nvl(pay6, -1) from mytable
    union select id, description, nvl(pay7, -1) from mytable
    union select id, description, nvl(pay8, -1) from mytable
) x
group by id, description

I changed nulls into -1 so they would participate cleanly in the deduping. 我将空值更改为-1,这样他们就可以干净利落地参与重复数据删除。

Here is a solution that reads the base table just once, and takes advantage of the data being organized in rows already. 这是一个只读取基表一次的解决方案,并利用已经按行组织的数据。 (Unpivoting would be inefficient, since this information would be lost resulting in massive additional work.) (不信任将是低效的,因为这些信息会丢失,导致大量的额外工作。)

It assumes all NULL s are counted as the same. 它假设所有NULL都被计为相同。 If instead they should be considered different from each other, change the -1 in nvl to distinct values: -1 for Pay1 , -2 for Pay2 , etc. 如果相反它们应该被认为彼此不同,则将nvl-1更改为不同的值: -1表示Pay1-2表示Pay2 ,等等。

with
     inputs( ID, Description, Pay1, Pay2, Pay3 ) as (     
       select 1, 'asdf1',                   10,  20,  10 from dual union all
       select 2, 'asdf2',                    0,  10,  20 from dual union all
       select 3, 'asdf3',                  100, 100, 100 from dual union all
       select 4, 'asdf4', cast(null as number),   0,  10 from dual
     )
--  End of TEST data (not part of solution!) SQL query begins BELOW THIS LINE.
select   id, description, pay1, pay2, pay3,
           1
         + case when nvl(pay2, -1) not in (nvl(pay1, -1)) 
                then 1 else 0 end
         + case when nvl(pay3, -1) not in (nvl(pay1, -1), nvl(pay2, -1))
                then 1 else 0 end
                                       as distinct_pays
from     inputs
order by id   --  if needed
;

ID DESCRIPTION     PAY1    PAY2    PAY3 DISTINCT_PAYS
-- ------------ ------- ------- ------- -------------
 1 asdf1             10      20      10             2
 2 asdf2              0      10      20             3
 3 asdf3            100     100     100             1
 4 asdf4                      0      10             3

4 rows selected.

The solution would be: 解决方案是:

  1. Start with your initial table without the column #uniquePays . 从没有列#uniquePays初始表开始。
  2. Unpivot your table. 打开你的桌子。

From this 由此

ID   Description   Pay1    Pay2   Pay3 
1    asdf1         10      20     10  

Make this: 这个:

ID seq Description Pay
 1   1 asdf1       10
 1   2 asdf1       20
 1   3 asdf1       10
  1. From the unpivoted table, run a SELECT COUNT(DISTINCT Pay) 从unpivoted表中,运行SELECT COUNT(DISTINCT Pay)
  2. Re-pivot the table, adding the COUNT(DISTINCT Pay). 重新调整表格,添加COUNT(DISTINCT Pay)。

Will this do, or do you need an exemplary script? 这样做,还是需要示例脚本? I've been posting quite a bit about pivoting and un-pivoting lately .... seems to be a popular need :-] 我最近发布了很多关于旋转和非旋转的内容....似乎是一个受欢迎的需求: - ]

Marco the Sane Marco the Sane

您可以编写插入触发器存储过程来计算每个插入语句的唯一值的总数,并在唯一列中进行更新。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM