简体   繁体   English

在PostgreSQL中加入多列

[英]JOIN with multiple columns in postgresql

I have the following two tables in postgresql: 我在PostgreSQL中有以下两个表:

     TABLE: act_codes
    ===================
     activity  act_desc
    ____________________
        1      sleeping
        2      commuting
        3      eating
        4      working
     TABLE: data
    ===================
    act1_1     act_1_2     act1_3     act1_4
    ---------------------------------------------
      1         1           3           4
      1         2           2           3
      1         1           2           2
      1         2           2           3
      1         1           1           2
      1         1           3           4
      1         2           2           4
      1         1           1           3
      1         3           3           4
      1         1           4           4

The act_codes table is basically a table of activities (with a code and a description), and the data table contains the activity codes for (in this case) 4 different times (act1_1, act1_2, act1_3 and act1_4). act_codes表基本上是一个活动表(带有代码和描述),而数据表包含(在这种情况下)4个不同时间(act1_1,act1_2,act1_3和act1_4)的活动代码。

I am trying to query this to get a table of counts for each activity. 我试图对此进行查询以获取每个活动的计数表。 I have managed to do this for each individual column (in this case act1_4) like this: 我已经设法对每个单独的列(在本例中为act1_4)执行以下操作:

    SELECT A.act_code, A.act_desc, COUNT (act1_4) 
    FROM act_codes AS A
    LEFT JOIN data AS D 
    ON D.act1_4 = A.act_code
    GROUP BY A.act_code, A.act_desc;   

Which works fine for that column, but I have a very large number of columns to work through, so would prefer it if there was a way to do this within an SQL query. 对于该列,该方法很好用,但我要处理的列非常多,因此,如果在SQL查询中可以做到这一点,则最好使用它。


I now have the following query (many thanks to banazs): 我现在有以下查询(非常感谢banazs):

    SELECT
        ac.act_code, 
        ac.act_desc,
        act_time,
        COUNT(activity) AS act_count
    FROM
        (SELECT
            UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS act_time,
            UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS activity
        FROM
            data d) t
    RIGHT JOIN
        act_codes ac ON t.activity = ac.act_code
    GROUP BY
        ac.act_code, 
        ac.act_desc,
        act_time, activity
    ORDER BY 
        activity, 
        act_time
    ;

Which outputs: 哪个输出:

    act_code        act_desc        act_time        act_count
    ---------------------------------------------------------
        1           sleeping            act1_1          10
        1           sleeping            act1_2          6
        1           sleeping            act1_3          2
        2           commuting           act1_2          3
        2           commuting           act1_3          4
        2           commuting           act1_4          2
        3           eating              act1_2          1
        3           eating              act1_3          3
        3           eating              act1_4          3
        4           working             act1_3          1
        4           working             act1_4          5

Which is basically what I was looking for. 基本上这就是我想要的。 Ideally, the rows with zero counts could be added in somehow, but gI am guessing that this is perhaps best done as a separate process (eg constructing a crosstab in R or something). 理想情况下,可以以某种方式添加计数为零的行,但是我想这也许最好作为单独的过程来完成(例如,在R中构建交叉表或其他方法)。

You can "unpivot" the data using UNNEST : 您可以使用UNNESTUNNEST ”数据:

   SELECT
        UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
        UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS value
    FROM
        data d
    ;

Count the activities: 计算活动:

SELECT
    ac.act_code, 
    ac.act_desc,
    COUNT(*)
FROM
    (SELECT
        UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
        UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS val
    FROM
        data d) t
INNER JOIN
    act_codes ac ON t.val = ac.act_code
GROUP BY
    ac.act_code, 
    ac.act_desc 
;

Thanks @banazs - that is really useful in terms of helping me understand how to structure queries like this. 感谢@banazs-这对于帮助我理解如何构建这样的查询非常有用。

However, I still have a difficulty in arranging the query to split the output so that there is a column of counts for each time. 但是,我仍然很难安排查询来拆分输出,以便每次都有一列计数。 Apologies - I think the labeling here is a bit confusing (act1_1 is referring to activities done at time_1, and 'act1_2' refers to time_2, etc..). 抱歉-我认为这里的标签有点混乱(act1_1指的是在time_1完成的活动,而“ act1_2”指的是time_2等)。 The result I am trying to get to looks like this: 我试图得到的结果看起来像这样:

    act_code    act_desc        count_act1_1    count_act1_2    count_act1_3    count_act1_4
    ----------------------------------------------------------------------------------------
        1       sleeping            10              6               2               0
        2       commuting           0               3               4               2
        3       eating              0               1               3               3
        4       working             0               0               1               5

I am not concerned about the output being in columns - I can easily reshape it, but it is important that the zero's are present in the table. 我不关心列中的输出-我可以很容易地调整它的形状,但是在表中存在零是很重要的。 Is this possible? 这可能吗?

To achive the table described above, the query is need to be redesigned a bit. 为了获得上述表格,需要对查询进行一些重新设计。

First you have to create an auxiliary table which contains the cartesian product of the column names and the activities: 首先,您必须创建一个辅助表,其中包含列名称和活动的笛卡尔乘积

SELECT 
    *
FROM
    act_codes ac
-- if you have lots of columns you can query their 
-- names from the information_schema.columns system 
-- table 
CROSS JOIN -- the CROSS JOIN combine each rows from both tables
    (SELECT 
        column_name 
    FROM 
        information_schema.columns 
    WHERE 
        table_schema = 'stackoverflow' 
        AND table_name = 'data' 
        AND column_name LIKE 'act%') cn 
;

Adding the number of activites to this: 将活动数添加到此:

SELECT 
    ac.act_code,
    ac.act_desc,
    cn.column_name,
    -- the COALESCE add zero values where the original is NULL
    COALESCE(ad.act_no ,0) AS act_no
FROM
    act_codes ac
CROSS JOIN
    (SELECT 
        column_name
    FROM 
        information_schema.columns 
    WHERE 
        table_schema = 'stackoverflow' 
        AND table_name = 'data' 
        AND column_name LIKE 'act%') cn
-- you need to use LEFT JOIN to preserve all rows
-- from the cartesian product
LEFT JOIN
    (SELECT 
        t.column_name,
        t.act_code,
        COUNT(*) AS act_no
    FROM
        (SELECT
            UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
            UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS act_code
        FROM
            data d) t
    GROUP BY
        t.column_name,
        t.act_code) ad ON ad.act_code = ac.act_code AND ad.column_name = cn.column_name 
;

To format the result to looks like yours is possible, but a little bit messy. 可以将结果格式化为看起来像您的结果,但是有点混乱。 You need to create two tables, the first has to contain the result set of the previous query, the second the column names. 您需要创建两个表,第一个必须包含上一个查询的结果集,第二个必须包含列名。

CREATE TABLE acts AS
    SELECT 
        ac.act_code,
        ac.act_desc,
        cn.column_name,
        COALESCE(ad.act_no ,0) AS act_no
    FROM
        act_codes ac
    CROSS JOIN
        (SELECT 
            column_name
        FROM 
            information_schema.columns 
        WHERE 
            table_schema = 'stackoverflow' 
            AND table_name = 'data' 
            AND column_name LIKE 'act%') cn
    LEFT JOIN
        (SELECT 
            t.column_name,
            t.act_code,
            COUNT(*) AS act_no
        FROM
            (SELECT
                UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
                UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS act_code
            FROM
                data d) t
        GROUP BY
            t.column_name,
            t.act_code) ad ON ad.act_code = ac.act_code AND ad.column_name = cn.column_name 
;

CREATE TABLE column_names AS
    SELECT 
        column_name
    FROM 
        information_schema.columns 
    WHERE 
        table_schema = 'stackoverflow' 
        AND table_name = 'data' 
        AND column_name LIKE 'act%'
;

Install the tablefunc extension . 安装tablefunc扩展名

CREATE EXTENSION tablefunc;

It provides the crosstab() function and using this you can get the described output. 它提供了crosstab()函数,并使用它可以获取描述的输出。

SELECT 
    *
FROM   
    crosstab(
        'SELECT act_desc, column_name, act_no FROM acts ORDER  BY 1',  
        'SELECT * FROM column_names'
    )  
AS 
    ct (
        "act_desc" text, 
        "act1_1" int, 
        "act1_2" int, 
        "act1_3" int, 
        "act1_4" int
        );
;

+-----------+--------+--------+--------+--------+
| act_desc  | act1_1 | act1_2 | act1_3 | act1_4 |
+-----------+--------+--------+--------+--------+
| commuting |      0 |      3 |      4 |      2 |
| eating    |      0 |      1 |      3 |      3 |
| sleeping  |     10 |      6 |      2 |      0 |
| working   |      0 |      0 |      1 |      5 |
+-----------+--------+--------+--------+--------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM