简体   繁体   中英

postgresql - count distinct combination of three columns- order doesn't matter

I'm trying to count distinct combinations of three columns, order of the columns doesn't matter

sample:

    a a a 
    a a b 
    a b a 
    b b a 
    b a b

the result I'm getting:

a a a 1
a a b 1
a b a 1
b b a 1
b a b 1

desired result

aaa 1
aab 2
bba 2

You can use an ordered array

select  v[1], v[2], v[3], count(*) n
from tbl t
cross join lateral (
  select array_agg(col order by col) v
  from (
    values (c1),(c2),(c3)
  ) t(col)
) s
group by v[1], v[2], v[3];

db<>fiddle

Maybe you can use checksums for getting the required result eg if it is really just combinations 'a' and 'b' that you are dealing with, you could convert the letters to integers (by calling the ASCII() function) and add these up so that you get a checksum.

TABLE

create table t (c1, c2, c3 ) as
select 'a', 'a', 'a' union all 
select 'a', 'a', 'b' union all 
select 'a', 'b', 'a' union all 
select 'b', 'b', 'a' union all 
select 'b', 'a', 'b' ;

Checksums

select c1, c2, c3, ascii( c1 ) + ascii( c2 ) + ascii( c3 ) as checksum 
from t ;

-- output
c1  c2  c3  checksum
a   a   a   291
a   a   b   292
a   b   a   292
b   b   a   293
b   a   b   293

If this works for you, then you can use window functions eg

select c1, c2, c3, rc_ as rowcount
from (
  select c1, c2, c3
  , count(*) over ( partition by ascii( c1 ) + ascii( c2 ) + ascii( c3 ) order by 1 ) rc_
  , row_number() over ( partition by ascii( c1 ) + ascii( c2 ) + ascii( c3 ) order by 1 ) rn_  
  from t 
) sq
where rc_ = rn_ ;

-- output
c1  c2  c3  rowcount
a   a   a   1
a   b   a   2
b   a   b   2

See dbfiddle .

If you are dealing with strings that cannot easily converted to integers, you could create a mapping between the strings and integers, and implement the map_ as a view (so that it is easy to use in subsequent queries) eg

MAP

-- {1} find all distinct elements
-- {2} map each element to an integer
create view map_
as
select val_, rank() over ( order by val_ ) weight_
from (
  select distinct val_
  from (
    select distinct c1 val_ from t union all
    select distinct c2 from t union all
    select distinct c3 from t
  ) all_elements 
) unique_elements ;

Once you have this map, you can use its values for creating checksums (maybe also in a view)...

Checksums

create view t_checksums_
as
select c1, c2, c3, c1weight + c2weight + c3weight as checksum
from (
  select
    c1, ( select weight_ from map_ where c1 = map_.val_ ) c1weight
  , c2, ( select weight_ from map_ where c2 = map_.val_ ) c2weight
  , c3, ( select weight_ from map_ where c3 = map_.val_ ) c3weight
from t 
) valandweight ;

... and then, you can use the same query as before, for obtaining the final result - see dbfiddle .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM