How can I create a new column that returns the number of distinct values in each row inside my table? For instance,
ID Description Pay1 Pay2 Pay3 #UniquePays
1 asdf1 10 20 10 2
2 asdf2 0 10 20 3
3 asdf3 100 100 100 1
4 asdf4 0 10 3
The query may return >1million rows so it needs to be somewhat efficient. There are 8 'Pay' columns in total, which are either NULL or an integer. Also note that '0' should be counted distinct from NULL.
The most I've been able to accomplish thus far (which I just realized isn't even accurate) is counting the total number of Pay entries in each row:
nvl(length(length(Pay1)),0)
+nvl(length(length(Pay2)),0)
+nvl(length(length(Pay3)),0) "NumPays"
The typical row only has 4 of the 8 columns populated, with the rest being null, and the max integer in the Pay column is '999' (hence the length-length conversion attempt..)
My SQL skills are primitive but any help is appreciated!
If you have, or can create, a user-defined table of numbers, you could use create a collection, use the set
function to get rid of duplicates, and then use the cardinality
function to count the remaining values:
cardinality(set(t_num(pay1, pay2, pay3))) as uniquepays
To include all eight of your columns, just add the extra column names to list passed to the tnum()
constructor.
cardinality(set(t_num(pay1, pay2, pay3, pay4, pay5, pay6, pay7, pay8))) as uniquepays
Demo with your sample table generated as a CTE:
create type t_num as table of number
/
with t (ID, Description, Pay1, Pay2, Pay3) as (
select 1, 'asdf1', 10, 20, 10 from dual
union all select 2, 'asdf2', 0, 10, 20 from dual
union all select 3, 'asdf3', 100, 100, 100 from dual
union all select 4, 'asdf4', null, 0, 10 from dual
)
select id, description, pay1, pay2, pay3,
cardinality(set(t_num(pay1, pay2, pay3))) as uniquepays
from t
order by id;
ID DESCR PAY1 PAY2 PAY3 UNIQUEPAYS
---------- ----- ---------- ---------- ---------- ----------
1 asdf1 10 20 10 2
2 asdf2 0 10 20 3
3 asdf3 100 100 100 1
4 asdf4 0 10 3
Whether that is efficient enough with millions of rows will need to be tested.
Here is one relatively simple way:
CREATE TYPE number_list AS TABLE OF NUMBER;
with t (ID, Description, Pay1, Pay2, Pay3) as (
select 1, 'asdf1', 10, 20, 10 from dual
union all select 2, 'asdf2', 0, 10, 20 from dual
union all select 3, 'asdf3', 100, 100, 100 from dual
union all select 4, 'asdf4', null, 0, 10 from dual
)
SELECT id,
description,
pay1,
pay2,
pay3,
(SELECT COUNT (DISTINCT NVL (TO_CHAR (COLUMN_VALUE), '#NULL#'))
FROM TABLE (number_list (pay1, pay2, pay3))) uniquepays
FROM t;
ID DESCR PAY1 PAY2 PAY3 UNIQUEPAYS
---------- ----- ---------- ---------- ---------- ----------
1 asdf1 10 20 10 2
2 asdf2 0 10 20 3
3 asdf3 100 100 100 1
4 asdf4 0 10 3
Split out each value into its own row (like it should have been stored in the first place), then union
then up and (since union
discards duplicates) just count the rows:
select id, description, count(*) unique_pays from (
select id, description, nvl(pay1, -1) from mytable
union select id, description, nvl(pay2, -1) from mytable
union select id, description, nvl(pay3, -1) from mytable
union select id, description, nvl(pay4, -1) from mytable
union select id, description, nvl(pay5, -1) from mytable
union select id, description, nvl(pay6, -1) from mytable
union select id, description, nvl(pay7, -1) from mytable
union select id, description, nvl(pay8, -1) from mytable
) x
group by id, description
I changed nulls into -1 so they would participate cleanly in the deduping.
Here is a solution that reads the base table just once, and takes advantage of the data being organized in rows already. (Unpivoting would be inefficient, since this information would be lost resulting in massive additional work.)
It assumes all NULL
s are counted as the same. If instead they should be considered different from each other, change the -1
in nvl
to distinct values: -1
for Pay1
, -2
for Pay2
, etc.
with
inputs( ID, Description, Pay1, Pay2, Pay3 ) as (
select 1, 'asdf1', 10, 20, 10 from dual union all
select 2, 'asdf2', 0, 10, 20 from dual union all
select 3, 'asdf3', 100, 100, 100 from dual union all
select 4, 'asdf4', cast(null as number), 0, 10 from dual
)
-- End of TEST data (not part of solution!) SQL query begins BELOW THIS LINE.
select id, description, pay1, pay2, pay3,
1
+ case when nvl(pay2, -1) not in (nvl(pay1, -1))
then 1 else 0 end
+ case when nvl(pay3, -1) not in (nvl(pay1, -1), nvl(pay2, -1))
then 1 else 0 end
as distinct_pays
from inputs
order by id -- if needed
;
ID DESCRIPTION PAY1 PAY2 PAY3 DISTINCT_PAYS
-- ------------ ------- ------- ------- -------------
1 asdf1 10 20 10 2
2 asdf2 0 10 20 3
3 asdf3 100 100 100 1
4 asdf4 0 10 3
4 rows selected.
The solution would be:
#uniquePays
. From this
ID Description Pay1 Pay2 Pay3
1 asdf1 10 20 10
Make this:
ID seq Description Pay
1 1 asdf1 10
1 2 asdf1 20
1 3 asdf1 10
Will this do, or do you need an exemplary script? I've been posting quite a bit about pivoting and un-pivoting lately .... seems to be a popular need :-]
Marco the Sane
您可以编写插入触发器或存储过程来计算每个插入语句的唯一值的总数,并在唯一列中进行更新。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.