简体   繁体   中英

SQL Aggregation with SUM, GROUP BY and JOIN (many-to-many)

Here's an example Table layout:

TABLE_A:                    TABLE_B:     TABLE_A_B:
id | a     | b    | c       id | name    a_id | b_id
---------------------       ---------    -----------
1  | true  | X    | A       1  | A       1    | 1
2  | true  | Z    | null    2  | B       1    | 2
3  | false | X    | null    3  | C       2    | 2
4  | true  | Y    | Q                    4    | 1
5  | false | null | null                 4    | 2
                                         5    | 1

Possible Values:

  • TABLE_A.a: true, false
  • TABLE_A.b: X, Y, Z
  • TABLE_A.c: A, B, C, ... basically arbitrary
  • TABLE_B.name: A, B, C, ... basically arbitrary

What I want to achieve:

SELECT all rows from TABLE_A
  SUM(where a = true),
  SUM(where a = false),
  SUM(where b = 'X'),
  SUM(where b = 'Y'),
  SUM(where b = 'Z'),
  SUM(where b IS NULL),
and also get the SUMs for all distinct TABLE_A.c values.
and also get the SUMs for all those TABLE_A_B relations.

The result for the example Table above should look like:

aTrue | aFalse | bX | bY | bZ | bNull | cA | cQ | cNull | nameA | nameB | nameC
-------------------------------------------------------------------------------
3     | 2      | 2  | 1  | 1  | 1     | 1  | 1  | 3     | 3     | 3     | 0

What I've done so far:

SELECT
  SUM(CASE WHEN a = true THEN 1 ELSE 0 END) AS aTrue,
  SUM(CASE WHEN b = false THEN 1 ELSE 0 END) AS aFalse,
  SUM(CASE WHEN b = 'X' THEN 1 ELSE 0 END) AS bX,
  ...
FROM TABLE_A

What's my problem?

Selecting column TABLE_A.a and TABLE_A.b is easy, because there's a fixed number of possible values.

But I can't figure out how to count the distinct values of TABLE_A.c . And basically the same problem for the JOINed TABLE_B , because the number of values within TABLE_B is unknown and can change over time.

Thanks for your help! :)

EDIT1: New (preferred) SQL result structure:

column         | value | sum
----------------------------
TABLE_A.a      | true  | 3
TABLE_A.a      | false | 2
TABLE_A.b      | X     | 2
TABLE_A.b      | Y     | 1
TABLE_A.b      | Z     | 1
TABLE_A.b      | null  | 1
TABLE_A.c      | A     | 1
TABLE_A.c      | Q     | 1
TABLE_A.c      | null  | 3
TABLE_B.name   | A     | 3
TABLE_B.name   | B     | 3
TABLE_B.name   | C     | 0

From your original request of rows as a simulated pivot. By doing a SUM( logical condition ) basically returns 1 if true, 0 if false. So, since the column "a" is true or false, simple sum of "a" or NOT "a" (for the false counts -- NOT FALSE = TRUE). Similarly, your "b" column, so b='X' = true counted as 1, else 0.

In other sql engines, you might see it as SUM( case/when ).

Now, since your table counts don't rely on each other, they can be separate SUM() into their own sub-alias query references (pqA and pqB for pre-queryA and pre-queryB respectively). Since no group by, they will each result in a single row. With no join will create a Cartesian, but since 1:1 ratio, will only return a single record of all columns you want.

SELECT 
      pqA.*, pqB.*
   from
      ( SELECT
              SUM( ta.a ) aTrue,
              SUM( NOT ta.a ) aFalse,
              SUM( ta.b = 'X' ) bX,
              SUM( ta.b = 'Y' ) bY,
              SUM( ta.b = 'Z' ) bZ,
              SUM( ta.b is null ) bNULL,
              SUM( ta.c = 'A' ) cA,
              SUM( ta.c = 'Q' ) cQ,
              SUM( ta.c is null ) cNULL,
              COUNT( distinct ta.c ) DistC
           from
              table_a ta ) pqA,
      ( SELECT
              SUM( b.Name = 'A' ) nameA,
              SUM( b.Name = 'B' ) nameB,
              SUM( b.Name = 'C' ) nameC
           from
              table_a_b t_ab 
                 join table_b b
                    ON t_ab.b_id = b.id ) pqB

This option gives your second (preferred) output

SELECT
      MAX( 'TABLE_A.a   ' ) as Basis,
      CASE when a then 'true' else 'false' end Value,
      COUNT(*) finalCnt
   from
      TABLE_A
   group by
      a
UNION ALL
SELECT
      MAX( 'TABLE_A.b   ' ) as Basis,
      b Value,
      COUNT(*) finalCnt
   from
      TABLE_A
   group by
      b
UNION ALL
SELECT
      MAX( 'TABLE_A.c   ' ) as Basis,
      c Value,
      COUNT(*) finalCnt
   from
      TABLE_A
   group by
      c
UNION ALL
SELECT
      MAX( 'TABLE_B.name   ' ) as Basis,
      b.Name Value,
      COUNT(*) finalCnt
   from
      table_a_b t_ab 
         join table_b b
            ON t_ab.b_id = b.id 
   group by
      b.Name

I think You will need to build dynamic query as you don't know possible values for column C in table A. So you can write store procedure where you can get list of distinct value for Column C in one variable and by using "Do WHILE" you can construct your dynamic query. Please let me know if you need more help in detail Dynamic SQL

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM