简体   繁体   中英

JOIN with multiple columns in postgresql

I have the following two tables in postgresql:

     TABLE: act_codes
    ===================
     activity  act_desc
    ____________________
        1      sleeping
        2      commuting
        3      eating
        4      working
     TABLE: data
    ===================
    act1_1     act_1_2     act1_3     act1_4
    ---------------------------------------------
      1         1           3           4
      1         2           2           3
      1         1           2           2
      1         2           2           3
      1         1           1           2
      1         1           3           4
      1         2           2           4
      1         1           1           3
      1         3           3           4
      1         1           4           4

The act_codes table is basically a table of activities (with a code and a description), and the data table contains the activity codes for (in this case) 4 different times (act1_1, act1_2, act1_3 and act1_4).

I am trying to query this to get a table of counts for each activity. I have managed to do this for each individual column (in this case act1_4) like this:

    SELECT A.act_code, A.act_desc, COUNT (act1_4) 
    FROM act_codes AS A
    LEFT JOIN data AS D 
    ON D.act1_4 = A.act_code
    GROUP BY A.act_code, A.act_desc;   

Which works fine for that column, but I have a very large number of columns to work through, so would prefer it if there was a way to do this within an SQL query.


I now have the following query (many thanks to banazs):

    SELECT
        ac.act_code, 
        ac.act_desc,
        act_time,
        COUNT(activity) AS act_count
    FROM
        (SELECT
            UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS act_time,
            UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS activity
        FROM
            data d) t
    RIGHT JOIN
        act_codes ac ON t.activity = ac.act_code
    GROUP BY
        ac.act_code, 
        ac.act_desc,
        act_time, activity
    ORDER BY 
        activity, 
        act_time
    ;

Which outputs:

    act_code        act_desc        act_time        act_count
    ---------------------------------------------------------
        1           sleeping            act1_1          10
        1           sleeping            act1_2          6
        1           sleeping            act1_3          2
        2           commuting           act1_2          3
        2           commuting           act1_3          4
        2           commuting           act1_4          2
        3           eating              act1_2          1
        3           eating              act1_3          3
        3           eating              act1_4          3
        4           working             act1_3          1
        4           working             act1_4          5

Which is basically what I was looking for. Ideally, the rows with zero counts could be added in somehow, but gI am guessing that this is perhaps best done as a separate process (eg constructing a crosstab in R or something).

You can "unpivot" the data using UNNEST :

   SELECT
        UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
        UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS value
    FROM
        data d
    ;

Count the activities:

SELECT
    ac.act_code, 
    ac.act_desc,
    COUNT(*)
FROM
    (SELECT
        UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
        UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS val
    FROM
        data d) t
INNER JOIN
    act_codes ac ON t.val = ac.act_code
GROUP BY
    ac.act_code, 
    ac.act_desc 
;

Thanks @banazs - that is really useful in terms of helping me understand how to structure queries like this.

However, I still have a difficulty in arranging the query to split the output so that there is a column of counts for each time. Apologies - I think the labeling here is a bit confusing (act1_1 is referring to activities done at time_1, and 'act1_2' refers to time_2, etc..). The result I am trying to get to looks like this:

    act_code    act_desc        count_act1_1    count_act1_2    count_act1_3    count_act1_4
    ----------------------------------------------------------------------------------------
        1       sleeping            10              6               2               0
        2       commuting           0               3               4               2
        3       eating              0               1               3               3
        4       working             0               0               1               5

I am not concerned about the output being in columns - I can easily reshape it, but it is important that the zero's are present in the table. Is this possible?

To achive the table described above, the query is need to be redesigned a bit.

First you have to create an auxiliary table which contains the cartesian product of the column names and the activities:

SELECT 
    *
FROM
    act_codes ac
-- if you have lots of columns you can query their 
-- names from the information_schema.columns system 
-- table 
CROSS JOIN -- the CROSS JOIN combine each rows from both tables
    (SELECT 
        column_name 
    FROM 
        information_schema.columns 
    WHERE 
        table_schema = 'stackoverflow' 
        AND table_name = 'data' 
        AND column_name LIKE 'act%') cn 
;

Adding the number of activites to this:

SELECT 
    ac.act_code,
    ac.act_desc,
    cn.column_name,
    -- the COALESCE add zero values where the original is NULL
    COALESCE(ad.act_no ,0) AS act_no
FROM
    act_codes ac
CROSS JOIN
    (SELECT 
        column_name
    FROM 
        information_schema.columns 
    WHERE 
        table_schema = 'stackoverflow' 
        AND table_name = 'data' 
        AND column_name LIKE 'act%') cn
-- you need to use LEFT JOIN to preserve all rows
-- from the cartesian product
LEFT JOIN
    (SELECT 
        t.column_name,
        t.act_code,
        COUNT(*) AS act_no
    FROM
        (SELECT
            UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
            UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS act_code
        FROM
            data d) t
    GROUP BY
        t.column_name,
        t.act_code) ad ON ad.act_code = ac.act_code AND ad.column_name = cn.column_name 
;

To format the result to looks like yours is possible, but a little bit messy. You need to create two tables, the first has to contain the result set of the previous query, the second the column names.

CREATE TABLE acts AS
    SELECT 
        ac.act_code,
        ac.act_desc,
        cn.column_name,
        COALESCE(ad.act_no ,0) AS act_no
    FROM
        act_codes ac
    CROSS JOIN
        (SELECT 
            column_name
        FROM 
            information_schema.columns 
        WHERE 
            table_schema = 'stackoverflow' 
            AND table_name = 'data' 
            AND column_name LIKE 'act%') cn
    LEFT JOIN
        (SELECT 
            t.column_name,
            t.act_code,
            COUNT(*) AS act_no
        FROM
            (SELECT
                UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
                UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS act_code
            FROM
                data d) t
        GROUP BY
            t.column_name,
            t.act_code) ad ON ad.act_code = ac.act_code AND ad.column_name = cn.column_name 
;

CREATE TABLE column_names AS
    SELECT 
        column_name
    FROM 
        information_schema.columns 
    WHERE 
        table_schema = 'stackoverflow' 
        AND table_name = 'data' 
        AND column_name LIKE 'act%'
;

Install the tablefunc extension .

CREATE EXTENSION tablefunc;

It provides the crosstab() function and using this you can get the described output.

SELECT 
    *
FROM   
    crosstab(
        'SELECT act_desc, column_name, act_no FROM acts ORDER  BY 1',  
        'SELECT * FROM column_names'
    )  
AS 
    ct (
        "act_desc" text, 
        "act1_1" int, 
        "act1_2" int, 
        "act1_3" int, 
        "act1_4" int
        );
;

+-----------+--------+--------+--------+--------+
| act_desc  | act1_1 | act1_2 | act1_3 | act1_4 |
+-----------+--------+--------+--------+--------+
| commuting |      0 |      3 |      4 |      2 |
| eating    |      0 |      1 |      3 |      3 |
| sleeping  |     10 |      6 |      2 |      0 |
| working   |      0 |      0 |      1 |      5 |
+-----------+--------+--------+--------+--------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM