简体   繁体   English

PostgreSQL中多个表的多个和/计数

[英]Multiple sum/counts across multiple tables in PostgreSQL

I've searched through several suggestions on this site and haven't quite been able to get what I'm after. 我已经在这个网站上搜索了几个建议,并且还没有完全得到我想要的东西。 I suspect there's just a syntax/punctuation issue that I'm just missing. 我怀疑只是一个语法/标点问题,我只是缺少。

I work on a database using phpPgAdmin that tracks lots of information related to a population of baboons being studied. 我使用phpPgAdmin处理数据库,该数据库跟踪与正在研究的狒狒群体相关的大量信息。 I'm trying to make a query to identify, for each individual baboon, how many tissue samples of different types we have collected for them and how many DNA samples we have of different types for each of them There are three tables that are pertinent to my problem: 我正在尝试查询,为每个狒狒确定我们为他们收集了多少不同类型的组织样本,以及我们为每个样本分别提供了多少不同类型的DNA样本有三个表格与我的问题:

Table: "biograph" has basic info about all the animals in the group, though the name is all I care about here. 表:“传记”有关于该组中所有动物的基本信息,尽管这里的名字都是我关心的。

name | birth
-----+-----------
A21  | 1968-07-01
AAR  | 2002-03-30
ABB  | 1998-09-10
ABD  | 2005-03-15
ABE  | 1986-01-01

Table: "babtissue" tracks information, including the below three columns, about different tissues that have been collected over the years. 表:“babtissue”跟踪多年来收集的不同组织的信息,包括以下三列。 Some lines in this table represent tissue samples that we no longer have, but are still referred to elsewhere in the database, so the "avail" column helps us screen for samples that we still have around. 此表中的某些行代表我们不再拥有的组织样本,但仍然在数据库的其他位置引用,因此“avail”列可帮助我们筛选我们仍然存在的样本。

name | sample_type | avail
-----+-------------+------
A21  | BLOOD       | Y
A21  | BLOOD       | Y
A21  | TISSUE      | N
ABB  | BLOOD       | Y
ABB  | TISSUE      | Y

Table: "dna" is similar to babtissue. 表:“dna”类似于babtissue。

name | sample_type | avail
-----+-------------+------
ABB  | GDNA        | N
ABB  | WGA         | Y
ACC  | WGA         | N
ALE  | GDNA        | Y
ALE  | GDNA        | Y

Altogether, I'm trying to write a query that will return every name from biograph and tells me in one column how many 'BLOOD', 'TISSUE', 'GDNA', and 'WGA' samples I have for each individual. 总而言之,我正在尝试编写一个将从传记中返回每个名字的查询,并在一列中告诉我有多少'BLOOD','TISSUE','GDNA'和'WGA'样本我每个人都有。 Something like... 就像是...

name | bloodsamps | tissuesamps | gdnas | wgas | avail
-----+------------+-------------+-------+------+------
A21  | 2          | 0           | 0     | 0    | ?
AAR  | 0          | 0           | 0     | 0    | ?
ABB  | 1          | 1           | 0     | 1    | ?
ACC  | 0          | 0           | 0     | 0    | ?
ALE  | 0          | 0           | 2     | 0    | ?

(Apologies for the weird formatting above, I'm not very familiar with writing this way) (对于上面奇怪的格式表示道歉,我对编写这种方式并不是很熟悉)

The latest version of the query that I've tried: 我尝试过的最新版本的查询:

select b.name,  
sum(case when t.sample_type='BLOOD' and t.avail='Y' then 1 else 0 end) as bloodsamps,   
sum(case when t.sample_type='TISSUE' and t.avail='Y' then 1 else 0 end) as tissuesamps,   
sum(case when d.sample_type='GDNA' and d.avail='Y' then 1 else 0 end) as gdnas,  
sum(case when d.sample_type='WGA' and d.avail='Y' then 1 else 0 end) as wgas  
from biograph b  
left join babtissue t on b.name=t.name  
left join dna d on b.name=d.name  
where b.name is not NULL  
group by b.name  
order by b.name  

I don't receive any errors when doing it this way, but I know the numbers it gives me are wrong--too high. 这样做时我没有收到任何错误,但我知道它给我的数字是错的 - 太高了。 I figure this has something to do with my use of more than one join, and that something about my join syntax needs to change. 我认为这与我使用多个连接有关,而且我的连接语法需要改变。

Any ideas? 有任何想法吗?

The numbers are too high because you're joining to babtissue and then also to dna , which is going to cause duplicates. 数字太高,因为你加入了babtissue ,然后也加入了dna ,这将导致重复。

You can try to break it up. 你可以尝试分解它。 I don't know if this syntax will work for your database, but I believe that it follows ANSI standards, so give it a shot... 我不知道这种语法是否适用于您的数据库,但我相信它遵循ANSI标准,所以试试看......

SELECT
    SQ.name,
    SUM(CASE WHEN T.sample_type = 'BLOOD' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS bloodsamps,
    SUM(CASE WHEN T.sample_type = 'TISSUE' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS tissuesamps,
    SQ.gdnas,
    SQ.wgas
FROM
    (
    SELECT
        B.name,
        SUM(CASE WHEN D.sample_type = 'GDNA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS gdnas,
        SUM(CASE WHEN D.sample_type = 'WGA' AND T.avail = 'Y' THEN 1 ELSE 0 END) AS wgas
    FROM
        biograph B
    LEFT JOIN dna D ON D.name = B.name
    GROUP BY
        B.name
    ) AS SQ
LEFT JOIN babtissue T on T.name = SQ.name
WHERE SQ.name is not NULL
GROUP BY SQ.name, SQ.gdnas, SQ.wgas
ORDER BY SQ.name

Can the name really be NULL? 该名称真的可以为NULL吗?

I don't know about the "avail" column, but this should give you the other columns you're looking for: 我不知道“avail”列,但是这应该为您提供您正在寻找的其他列:

SELECT  b.name,
        COALESCE (t.bloodsamps,  0) AS bloodsamps,
        COALESCE (t.tissuesamps, 0) AS tissuesamps
        COALESCE (d.gdnas, 0) AS gdnas 
        COALESCE (d.wgas,  0) AS wgas
    FROM biograph b
    LEFT JOIN (
        SELECT  name,
                SUM(CASE WHEN sample_type = 'BLOOD'  THEN 1 ELSE 0 END) AS bloodsamps,
                SUM(CASE WHEN sample_type = 'TISSUE' THEN 1 ELSE 0 END) AS tissuesamps
            FROM babtissue
            WHERE avail = 'Y'
            GROUP BY name
        ) t
        ON (t.name = b.name)
    LEFT JOIN (
        SELECT  name,
                SUM(CASE WHEN sample_type = 'GDNA' THEN 1 ELSE 0 END) AS gdnas,
                SUM(CASE WHEN sample_type = 'WGA'  THEN 1 ELSE 0 END) AS wgas
            FROM dna
            WHERE avail = 'Y'
            GROUP BY name
        ) d
        ON (d.name = b.name)
;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM