简体   繁体   中英

Count the number of distinct values of each row (SQL)

How can I create a new column that returns the number of distinct values in each row inside my table? For instance,

ID   Description   Pay1    Pay2   Pay3    #UniquePays     
1    asdf1         10      20     10      2
2    asdf2         0       10     20      3
3    asdf3         100     100    100     1
4    asdf4                 0      10      3

The query may return >1million rows so it needs to be somewhat efficient. There are 8 'Pay' columns in total, which are either NULL or an integer. Also note that '0' should be counted distinct from NULL.

The most I've been able to accomplish thus far (which I just realized isn't even accurate) is counting the total number of Pay entries in each row:

nvl(length(length(Pay1)),0)
+nvl(length(length(Pay2)),0)
+nvl(length(length(Pay3)),0) "NumPays"

The typical row only has 4 of the 8 columns populated, with the rest being null, and the max integer in the Pay column is '999' (hence the length-length conversion attempt..)

My SQL skills are primitive but any help is appreciated!

If you have, or can create, a user-defined table of numbers, you could use create a collection, use the set function to get rid of duplicates, and then use the cardinality function to count the remaining values:

cardinality(set(t_num(pay1, pay2, pay3))) as uniquepays

To include all eight of your columns, just add the extra column names to list passed to the tnum() constructor.

cardinality(set(t_num(pay1, pay2, pay3, pay4, pay5, pay6, pay7, pay8))) as uniquepays

Demo with your sample table generated as a CTE:

create type t_num as table of number
/

with t (ID, Description, Pay1, Pay2, Pay3) as (
  select 1, 'asdf1', 10, 20, 10 from dual
  union all select 2, 'asdf2', 0, 10, 20 from dual
  union all select 3, 'asdf3', 100, 100, 100 from dual
  union all select 4, 'asdf4', null, 0, 10 from dual
)
select id, description, pay1, pay2, pay3,
  cardinality(set(t_num(pay1, pay2, pay3))) as uniquepays
from t
order by id;

        ID DESCR       PAY1       PAY2       PAY3 UNIQUEPAYS
---------- ----- ---------- ---------- ---------- ----------
         1 asdf1         10         20         10          2
         2 asdf2          0         10         20          3
         3 asdf3        100        100        100          1
         4 asdf4                     0         10          3

Whether that is efficient enough with millions of rows will need to be tested.

Here is one relatively simple way:

CREATE TYPE number_list AS TABLE OF NUMBER;

with t (ID, Description, Pay1, Pay2, Pay3) as (
  select 1, 'asdf1', 10, 20, 10 from dual
  union all select 2, 'asdf2', 0, 10, 20 from dual
  union all select 3, 'asdf3', 100, 100, 100 from dual
  union all select 4, 'asdf4', null, 0, 10 from dual
)
SELECT id,
       description,
       pay1,
       pay2,
       pay3,
       (SELECT COUNT (DISTINCT NVL (TO_CHAR (COLUMN_VALUE), '#NULL#')) 
        FROM TABLE (number_list (pay1, pay2, pay3))) uniquepays
FROM   t;

        ID DESCR       PAY1       PAY2       PAY3 UNIQUEPAYS
---------- ----- ---------- ---------- ---------- ----------
         1 asdf1         10         20         10          2
         2 asdf2          0         10         20          3
         3 asdf3        100        100        100          1
         4 asdf4                     0         10          3

Split out each value into its own row (like it should have been stored in the first place), then union then up and (since union discards duplicates) just count the rows:

select id, description, count(*) unique_pays from (
    select id, description, nvl(pay1, -1) from mytable
    union select id, description, nvl(pay2, -1) from mytable
    union select id, description, nvl(pay3, -1) from mytable
    union select id, description, nvl(pay4, -1) from mytable
    union select id, description, nvl(pay5, -1) from mytable
    union select id, description, nvl(pay6, -1) from mytable
    union select id, description, nvl(pay7, -1) from mytable
    union select id, description, nvl(pay8, -1) from mytable
) x
group by id, description

I changed nulls into -1 so they would participate cleanly in the deduping.

Here is a solution that reads the base table just once, and takes advantage of the data being organized in rows already. (Unpivoting would be inefficient, since this information would be lost resulting in massive additional work.)

It assumes all NULL s are counted as the same. If instead they should be considered different from each other, change the -1 in nvl to distinct values: -1 for Pay1 , -2 for Pay2 , etc.

with
     inputs( ID, Description, Pay1, Pay2, Pay3 ) as (     
       select 1, 'asdf1',                   10,  20,  10 from dual union all
       select 2, 'asdf2',                    0,  10,  20 from dual union all
       select 3, 'asdf3',                  100, 100, 100 from dual union all
       select 4, 'asdf4', cast(null as number),   0,  10 from dual
     )
--  End of TEST data (not part of solution!) SQL query begins BELOW THIS LINE.
select   id, description, pay1, pay2, pay3,
           1
         + case when nvl(pay2, -1) not in (nvl(pay1, -1)) 
                then 1 else 0 end
         + case when nvl(pay3, -1) not in (nvl(pay1, -1), nvl(pay2, -1))
                then 1 else 0 end
                                       as distinct_pays
from     inputs
order by id   --  if needed
;

ID DESCRIPTION     PAY1    PAY2    PAY3 DISTINCT_PAYS
-- ------------ ------- ------- ------- -------------
 1 asdf1             10      20      10             2
 2 asdf2              0      10      20             3
 3 asdf3            100     100     100             1
 4 asdf4                      0      10             3

4 rows selected.

The solution would be:

  1. Start with your initial table without the column #uniquePays .
  2. Unpivot your table.

From this

ID   Description   Pay1    Pay2   Pay3 
1    asdf1         10      20     10  

Make this:

ID seq Description Pay
 1   1 asdf1       10
 1   2 asdf1       20
 1   3 asdf1       10
  1. From the unpivoted table, run a SELECT COUNT(DISTINCT Pay)
  2. Re-pivot the table, adding the COUNT(DISTINCT Pay).

Will this do, or do you need an exemplary script? I've been posting quite a bit about pivoting and un-pivoting lately .... seems to be a popular need :-]

Marco the Sane

您可以编写插入触发器存储过程来计算每个插入语句的唯一值的总数,并在唯一列中进行更新。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM