[英]JOIN with multiple columns in postgresql
I have the following two tables in postgresql: 我在PostgreSQL中有以下两个表:
TABLE: act_codes
===================
activity act_desc
____________________
1 sleeping
2 commuting
3 eating
4 working
TABLE: data
===================
act1_1 act_1_2 act1_3 act1_4
---------------------------------------------
1 1 3 4
1 2 2 3
1 1 2 2
1 2 2 3
1 1 1 2
1 1 3 4
1 2 2 4
1 1 1 3
1 3 3 4
1 1 4 4
The act_codes table is basically a table of activities (with a code and a description), and the data table contains the activity codes for (in this case) 4 different times (act1_1, act1_2, act1_3 and act1_4). act_codes表基本上是一个活动表(带有代码和描述),而数据表包含(在这种情况下)4个不同时间(act1_1,act1_2,act1_3和act1_4)的活动代码。
I am trying to query this to get a table of counts for each activity. 我试图对此进行查询以获取每个活动的计数表。 I have managed to do this for each individual column (in this case act1_4) like this:
我已经设法对每个单独的列(在本例中为act1_4)执行以下操作:
SELECT A.act_code, A.act_desc, COUNT (act1_4)
FROM act_codes AS A
LEFT JOIN data AS D
ON D.act1_4 = A.act_code
GROUP BY A.act_code, A.act_desc;
Which works fine for that column, but I have a very large number of columns to work through, so would prefer it if there was a way to do this within an SQL query. 对于该列,该方法很好用,但我要处理的列非常多,因此,如果在SQL查询中可以做到这一点,则最好使用它。
I now have the following query (many thanks to banazs): 我现在有以下查询(非常感谢banazs):
SELECT
ac.act_code,
ac.act_desc,
act_time,
COUNT(activity) AS act_count
FROM
(SELECT
UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS act_time,
UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS activity
FROM
data d) t
RIGHT JOIN
act_codes ac ON t.activity = ac.act_code
GROUP BY
ac.act_code,
ac.act_desc,
act_time, activity
ORDER BY
activity,
act_time
;
Which outputs: 哪个输出:
act_code act_desc act_time act_count
---------------------------------------------------------
1 sleeping act1_1 10
1 sleeping act1_2 6
1 sleeping act1_3 2
2 commuting act1_2 3
2 commuting act1_3 4
2 commuting act1_4 2
3 eating act1_2 1
3 eating act1_3 3
3 eating act1_4 3
4 working act1_3 1
4 working act1_4 5
Which is basically what I was looking for. 基本上这就是我想要的。 Ideally, the rows with zero counts could be added in somehow, but gI am guessing that this is perhaps best done as a separate process (eg constructing a crosstab in R or something).
理想情况下,可以以某种方式添加计数为零的行,但是我想这也许最好作为单独的过程来完成(例如,在R中构建交叉表或其他方法)。
You can "unpivot" the data using UNNEST
: 您可以使用
UNNEST
“ UNNEST
”数据:
SELECT
UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS value
FROM
data d
;
Count the activities: 计算活动:
SELECT
ac.act_code,
ac.act_desc,
COUNT(*)
FROM
(SELECT
UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS val
FROM
data d) t
INNER JOIN
act_codes ac ON t.val = ac.act_code
GROUP BY
ac.act_code,
ac.act_desc
;
Thanks @banazs - that is really useful in terms of helping me understand how to structure queries like this. 感谢@banazs-这对于帮助我理解如何构建这样的查询非常有用。
However, I still have a difficulty in arranging the query to split the output so that there is a column of counts for each time. 但是,我仍然很难安排查询来拆分输出,以便每次都有一列计数。 Apologies - I think the labeling here is a bit confusing (act1_1 is referring to activities done at time_1, and 'act1_2' refers to time_2, etc..).
抱歉-我认为这里的标签有点混乱(act1_1指的是在time_1完成的活动,而“ act1_2”指的是time_2等)。 The result I am trying to get to looks like this:
我试图得到的结果看起来像这样:
act_code act_desc count_act1_1 count_act1_2 count_act1_3 count_act1_4
----------------------------------------------------------------------------------------
1 sleeping 10 6 2 0
2 commuting 0 3 4 2
3 eating 0 1 3 3
4 working 0 0 1 5
I am not concerned about the output being in columns - I can easily reshape it, but it is important that the zero's are present in the table. 我不关心列中的输出-我可以很容易地调整它的形状,但是在表中存在零是很重要的。 Is this possible?
这可能吗?
To achive the table described above, the query is need to be redesigned a bit. 为了获得上述表格,需要对查询进行一些重新设计。
First you have to create an auxiliary table which contains the cartesian product of the column names and the activities: 首先,您必须创建一个辅助表,其中包含列名称和活动的笛卡尔乘积 :
SELECT
*
FROM
act_codes ac
-- if you have lots of columns you can query their
-- names from the information_schema.columns system
-- table
CROSS JOIN -- the CROSS JOIN combine each rows from both tables
(SELECT
column_name
FROM
information_schema.columns
WHERE
table_schema = 'stackoverflow'
AND table_name = 'data'
AND column_name LIKE 'act%') cn
;
Adding the number of activites to this: 将活动数添加到此:
SELECT
ac.act_code,
ac.act_desc,
cn.column_name,
-- the COALESCE add zero values where the original is NULL
COALESCE(ad.act_no ,0) AS act_no
FROM
act_codes ac
CROSS JOIN
(SELECT
column_name
FROM
information_schema.columns
WHERE
table_schema = 'stackoverflow'
AND table_name = 'data'
AND column_name LIKE 'act%') cn
-- you need to use LEFT JOIN to preserve all rows
-- from the cartesian product
LEFT JOIN
(SELECT
t.column_name,
t.act_code,
COUNT(*) AS act_no
FROM
(SELECT
UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS act_code
FROM
data d) t
GROUP BY
t.column_name,
t.act_code) ad ON ad.act_code = ac.act_code AND ad.column_name = cn.column_name
;
To format the result to looks like yours is possible, but a little bit messy. 可以将结果格式化为看起来像您的结果,但是有点混乱。 You need to create two tables, the first has to contain the result set of the previous query, the second the column names.
您需要创建两个表,第一个必须包含上一个查询的结果集,第二个必须包含列名。
CREATE TABLE acts AS
SELECT
ac.act_code,
ac.act_desc,
cn.column_name,
COALESCE(ad.act_no ,0) AS act_no
FROM
act_codes ac
CROSS JOIN
(SELECT
column_name
FROM
information_schema.columns
WHERE
table_schema = 'stackoverflow'
AND table_name = 'data'
AND column_name LIKE 'act%') cn
LEFT JOIN
(SELECT
t.column_name,
t.act_code,
COUNT(*) AS act_no
FROM
(SELECT
UNNEST(array['act1_1','act1_2','act1_3','act1_4']) AS column_name,
UNNEST(array[d.act1_1, d.act1_2, d.act1_3, d.act1_4]) AS act_code
FROM
data d) t
GROUP BY
t.column_name,
t.act_code) ad ON ad.act_code = ac.act_code AND ad.column_name = cn.column_name
;
CREATE TABLE column_names AS
SELECT
column_name
FROM
information_schema.columns
WHERE
table_schema = 'stackoverflow'
AND table_name = 'data'
AND column_name LIKE 'act%'
;
Install the tablefunc extension . 安装tablefunc扩展名 。
CREATE EXTENSION tablefunc;
It provides the crosstab() function and using this you can get the described output. 它提供了crosstab()函数,并使用它可以获取描述的输出。
SELECT
*
FROM
crosstab(
'SELECT act_desc, column_name, act_no FROM acts ORDER BY 1',
'SELECT * FROM column_names'
)
AS
ct (
"act_desc" text,
"act1_1" int,
"act1_2" int,
"act1_3" int,
"act1_4" int
);
;
+-----------+--------+--------+--------+--------+
| act_desc | act1_1 | act1_2 | act1_3 | act1_4 |
+-----------+--------+--------+--------+--------+
| commuting | 0 | 3 | 4 | 2 |
| eating | 0 | 1 | 3 | 3 |
| sleeping | 10 | 6 | 2 | 0 |
| working | 0 | 0 | 1 | 5 |
+-----------+--------+--------+--------+--------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.