简体   繁体   English

在MySQL中的不同行上获取SUM()

[英]Getting SUM() on distinct rows in mysql

I have a table ("dump") with transactions, and I want to list the total amount, grouped by category, per month, like: Month | 我有一个包含交易的表格(“转储”),我想按月列出按类别分组的总金额,例如: Category | 分类| Category ID | 分类ID | SUM. 和。 The tables involved looks like this: 涉及的表如下所示:

TABLE dump:
id INT
date DATE
event VARCHAR(100)
amount DECIMAL(10, 2)
TABLE dump_cat:
id INT
did INT (id in dump)
cid INT (id in categories)
TABLE categories:
id INT
name VARCHAR(100)

Now the query I'm trying to use is: 现在我要使用的查询是:

SELECT SUBSTR(d.date,1,7) AS month, c.name, c.id AS catid, SUM(d.amount) AS sum
 FROM dump as d, dump_cat as dc, categories AS c
 WHERE dc.did = d.id AND c.id = dc.cid AND SUBSTR(d.date, 1, 7) >= '2008-08'
 GROUP BY month, c.name ORDER BY month;

But the sum for most categories is twice as big as it should be. 但是大多数类别的总和是应有的两倍。 My guess is that this is because the join returns multiple rows, but adding "DISTINCT d.id" in the field part doesn't make any difference. 我的猜测是,这是因为联接返回多行,但是在字段部分添加“ DISTINCT d.id”没有任何区别。 An example of what the query returns is: 查询返回的示例如下:

+---------+--------------------------+-------+-----------+
| month   | name                     | catid | sum       |
+---------+--------------------------+-------+-----------+
| 2008-08 | Cash                     |    21 |  -6200.00 | 
| 2008-08 | Gas                      |     8 |  -2936.19 | 
| 2008-08 | Rent                     |     1 | -15682.00 |

where as 在哪里

SELECT DISTINCT d.id, d.amount FROM dump AS d, dump_cat AS dc
 WHERE d.id = dc.did AND SUBSTR(d.date, 1, 7) ='2008-08' AND dc.cid = 21;

returns 退货

+------+----------+
| id   | amount   |
+------+----------+
| 3961 |  -600.00 | 
| 2976 |  -200.00 | 
| 2967 |  -400.00 | 
| 2964 |  -200.00 | 
| 2957 |  -300.00 | 
| 2962 | -1400.00 | 
+------+----------+

That makes a total of 3100, half of the sum listed above. 总计3100,是上面列出的总和的一半。 If I remove "DISTINCT d.id" from the last query, every row is listed twice. 如果我从上一个查询中删除“ DISTINCT d.id”,则每一行都会列出两次。 This I think is the problem, but I need help to figure out how to solve it. 我认为这是问题所在,但我需要帮助找出解决方法。 Thanks in advance. 提前致谢。

Added: If I collect the dump and dump_cat tables into one, with 补充:如果我将dump和dump_cat表收集到一个表中,

CREATE table dumpwithcat SELECT DISTINCT d.id, d.date, d.event, d.amount, dc.cid
  FROM dump AS d, dump_cat AS c WHERE c.did = d.id;

and do the query on that table, everything works fine with correct sum. 并在该表上进行查询,一切正确且总和正确。 Is there a way to do this in the original query, with a subquery or something like that? 有没有办法在子查询或类似的原始查询中做到这一点?

That makes a total of 3100, half of the sum listed above. 总计3100,是上面列出的总和的一半。 If I remove "DISTINCT d.id" from the last query, every row is listed twice. 如果我从上一个查询中删除“ DISTINCT d.id”,则每一行都会列出两次。

While you may have only one category per dump, you therefore must have multiple rows in dump_cat per dump. 虽然每个转储可能只有一个类别, dump_cat每个转储在dump_cat必须有多 You should consider defining a UNIQUE constraint to ensure only one row exists per pair of did , cid : 您应该考虑定义UNIQUE约束,以确保每对didcid仅存在一行:

ALTER TABLE dump_cat ADD CONSTRAINT UNIQUE (did, cid);

I predict this statement will fail given the current data in your table. 鉴于您表中的当前数据,我预计该语句将失败。 It can't create a unique constraint when these columns already contain duplicates! 当这些列已经包含重复项时,它将无法创建唯一约束!

You can remove duplicates this way, for instance: 您可以通过以下方式删除重复项,例如:

DELETE dc1 FROM dump_cat dc1 JOIN dump_cat dc2 USING (did, cid)
WHERE dc1.id > dc2.id; -- only delete the second duplicate entry

edit: By the way, don't mark my question accepted until you have verified that I'm correct! 编辑:顺便说一句,除非您确认我是正确的,否则不要将我的问题标记为已接受! :-) :-)

You can verify that there are in fact duplicates as I suggest by using a query like the following: 您可以使用以下查询来验证是否确实存在重复项,如我建议的那样:

SELECT did, COUNT(*)
FROM dump_cat
GROUP BY did
HAVING COUNT(*) > 1;

Another possibility: you have more than one category with the same name? 另一种可能性:您有多个同名类别? (sorry my first try at this query was wrong, here's an edited version) (很抱歉,我第一次尝试此查询是错误的,这是一个编辑后的版本)

SELECT c.name, GROUP_CONCAT(c.id) AS cat_id_list, COUNT(*) AS c
FROM category c
GROUP BY c.name
HAVING COUNT(*) > 1;

FWIW, I did test the DELETE command I showed: FWIW,我确实测试了显示的DELETE命令:

INSERT INTO dump_cat (did, cid) VALUES (1, 2), (3,4), (3,4); -- duplicates!

DELETE dc1 FROM dump_cat dc1 JOIN dump_cat dc2 USING (did, cid) WHERE dc1.id > dc2.id
Query OK, 1 row affected (0.00 sec)

PS: This is tangential to your question, but the DISTINCT query modifier always applies to the whole row, not just the first column. PS:这与您的问题相切,但是DISTINCT查询修饰符始终应用于整个行,而不仅是第一列。 This is a common misunderstanding of many SQL programmers. 这是许多SQL程序员的普遍误解。

At first examination it looks to me like you might have the Referential integrity constraint bgetween Dump and Dump_Cat backwards. 乍一看,在我看来,您可能在Dump和Dump_Cat之间具有反向引用完整性约束。

Can Transactions (in Dump) be in multiple categories? (转储中的)交易可以分为多个类别吗? If not, then shouldn't the Transaction table, (Dump) specify which category each transaction is in, and not the otjher way around? 如果不是,那么事务表(转储)是否应指定每个事务属于哪个类别,而不是指定其他方式? ie, should there be a CatId in the Dump table and not a DumpId in the Cat table? 即,转储表中应该有一个CatId,而在Cat表中应该没有DumpId吗?

if Transactions can be in Multiple categories, then your data structure is correct, butthen you will unavoidably be double (or multiply) counting transaction amounts in any aggregate query because the transaction amount is in fact in multiple categories. 如果事务可以在多个类别中,则您的数据结构是正确的,但是在任何聚合查询中,不可避免地要对交易金额进行双(或乘)计数,因为交易金额实际上是在多个类别中。

If dump records can be in multiple categories, they will impact all of their category's rows for that month. 如果转储记录可以属于多个类别,则它们将影响该月该类别的所有行。

One solution for this is to also pull a COUNT() of categories for each dump record, and use that as a divisor for the individual amounts . 一种解决方案是还为每个转储记录提取COUNT()个类别,并将其用作各个数量的除数。 Thus, the amount is apportioned automatically in an even way across all categories the dump record belongs to, preserving the integrity of the overall total. 因此,在转储记录所属的所有类别中,该数量将以均匀的方式自动分配,从而保留了总计的完整性。

Something like this (sorry, MySQL isn't my daily RDBMS, unsure of the exact syntax): 这样的事情(对不起,MySQL不是我的日常RDBMS,不确定确切的语法):

 SELECT SUBSTR(d.date,1,7) AS month, c.name, c.id AS catid, 
   SUM(d.amount / (SELECT COUNT(*) FROM dump_cat dc2 WHERE dc2.did=d.id)) AS sum
 FROM dump as d, dump_cat as dc, categories AS c
 WHERE dc.did = d.id AND c.id = dc.cid AND SUBSTR(d.date, 1, 7) >= '2008-08'
 GROUP BY month, c.name ORDER BY month;

You can take just about any query, like the one you used to create the distinct table, and just select off of that. 您几乎可以接受任何查询,例如用于创建不同表的查询,然后从中选择一个即可。 Just give the the query a "table name". 只需给查询一个“表名”即可。

SELECT SUBSTR(d_dc.date,1,7) AS month, c.name, c.id AS catid, SUM(d_dc.amount) AS sum
FROM (SELECT DISTINCT d.id, d.date, d.event, d.amount, dc.cid
    FROM dump AS d, dump_cat AS dc WHERE dc.did = d.id
    WHERE SUBSTR(d.date, 1, 7) >= '2008-08') AS d_dc
JOIN categories AS c ON d_dc.cid=c.id
GROUP BY month, c.name ORDER BY month

That's probably not the most efficient way to do your query, and I may have gotten some of the table aliases wrong, but that should give you an idea of how to do it. 那可能不是执行查询的最有效方法,而且我可能弄错了一些表别名,但这应该使您知道如何执行此操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM