简体   繁体   中英

SQL - Group by unique column sets

Building off of the question " SQL - Turn relationship IDs into a delimited list ", I'd like to further group the results by unique sets of sites. For example, using the same data in that question:

原始数据

Sites 15 & 16 together share one permutation of a,b,c. Sites 18, 19, 20 share another permutation. Site 17 uses BOTH permutations.

I'd like to find a query with results similar to the linked question, but without any overlap in the site_ids:

在此处输入图片说明

I'm not sure that it's possible with just SQL, and I may just have to build something to crunch the data in code, but I'm crossing my fingers...

I'm using SQL Server, but it would also be nice to know how to do it with postgres.

UPDATE:
Someone suggested that Simulating group_concat MySQL function in Microsoft SQL Server 2005? might be a duplicate. From my (limited) understanding of group_concat, it appears that it might duplicate the question referenced above , but not this question. This question needs no overlap in site_ids.

I don't know if this is the most efficient way to do it, but it sure was fun coming up with it. :)

;WITH CTE_unique_groups AS (SELECT DISTINCT a, b, c FROM dbo.My_Table),
CTE_numbered_groups AS (SELECT a, b, c, ROW_NUMBER() OVER(ORDER BY a, b, c) AS row_num FROM CTE_unique_groups),
CTE_exponented_groups AS (SELECT a, b, c, POWER(2, row_num) AS group_value FROM CTE_numbered_groups),
CTE_unique_sets AS (
    SELECT
        MT.site_id,
        SUM(grps.group_value) AS total_group_value
    FROM
        CTE_exponented_groups AS grps
    INNER JOIN dbo.My_Table MT ON MT.a = grps.a AND MT.b = grps.b AND MT.c = grps.c
    GROUP BY
        MT.site_id
),
CTE_grouped_sites AS
(SELECT total_group_value, STUFF((SELECT ', ' + CAST(site_id AS VARCHAR(10)) FROM CTE_unique_sets t2 WHERE t2.total_group_value = t1.total_group_value FOR XML PATH('')), 1, 2, '') AS site_ids
FROM CTE_unique_sets t1)
SELECT DISTINCT
    x.a, x.b, x.c, gs.site_ids
FROM
    CTE_grouped_sites gs
INNER JOIN CTE_exponented_groups x ON
    gs.total_group_value & x.group_value = x.group_value

The gist of it is that first you have to identify every unique set of groups. Once you've done that, you can then use the bitwise & operator to explode the site IDs back into the groups to which they belonged.

If you're okay with an array (which sort of manifests itself as a comma delimited string the way you have it), here is a one possible approach in PostgreSQL.

with perms as (
  select
    f.a, f.b, f.c, f.site_id,
    count(1) over (partition by f.site_id) as cnt
  from
    your_table f
)
select
  p1.a, p1.b, p1.c,
  array (select p2.site_id
         from perms p2
         where
           p1.a = p2.a and
           p1.b = p2.b and
           p1.c = p2.c and
           p1.cnt = p2.cnt)
from perms p1
group by
  p1.a, p1.b, p1.c, p1.cnt

The results would look like this:

bar    baz    blah    {20,18,19}
foo    bar    baz     {17}
foo    bar    baz     {16,15}
bar    baz    blah    {17}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM