简体   繁体   English

MYSQL sum() 用于不同的行

[英]MYSQL sum() for distinct rows

I'm looking for help using sum() in my SQL query:我正在寻找在我的 SQL 查询中使用 sum() 的帮助:

SELECT links.id, 
       count(DISTINCT stats.id) as clicks, 
       count(DISTINCT conversions.id) as conversions, 
       sum(conversions.value) as conversion_value 
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY links.id 
ORDER BY links.created desc;

I use DISTINCT because I'm doing "group by" and this ensures the same row is not counted more than once.我使用DISTINCT是因为我正在做“分组依据”,这确保了同一行不会被多次计算。

The problem is that SUM(conversions.value) counts the "value" for each row more than once (due to the group by)问题是 SUM(conversions.value) 不止一次计算每一行的“值”(由于 group by)

I basically want to do SUM(conversions.value) for each DISTINCT conversions.id.我基本上想为每个 DISTINCT conversions.id 做SUM(conversions.value)

Is that possible?那可能吗?

I may be wrong but from what I understand我可能错了,但据我了解

  • conversions.id is the primary key of your table conversions Conversions.id是表转换主键
  • stats.id is the primary key of your table stats stats.id是表stats主键

Thus for each conversions.id you have at most one links.id impacted.因此,对于每个conversions.id,您最多有一个links.id 受到影响。

You request is a bit like doing the cartesian product of 2 sets :您的要求有点像做 2 套笛卡尔积:

[clicks]
SELECT *
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 

[conversions]
SELECT *
FROM links 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 

and for each link, you get sizeof([clicks]) x sizeof([conversions]) lines对于每个链接,您会得到 sizeof([clicks]) x sizeof([conversions]) 行

As you noted the number of unique conversions in your request can be obtained via a正如您所指出的,您的请求中的唯一转换次数可以通过

count(distinct conversions.id) = sizeof([conversions])

this distinct manages to remove all the [clicks] lines in the cartesian product这个 distinct 设法删除笛卡尔积中的所有 [clicks] 行

but clearly但很明显

sum(conversions.value) = sum([conversions].value) * sizeof([clicks])

In your case, since在你的情况下,因为

count(*) = sizeof([clicks]) x sizeof([conversions])
count(*) = sizeof([clicks]) x count(distinct conversions.id)

you have你有

sizeof([clicks]) = count(*)/count(distinct conversions.id)

so I would test your request with所以我会用

SELECT links.id, 
   count(DISTINCT stats.id) as clicks, 
   count(DISTINCT conversions.id) as conversions, 
   sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value 
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY links.id 
ORDER BY links.created desc;

Keep me posted !让我张贴! Jerome杰罗姆

Jeromes solution is actually wrong and can produce incorrect results!! Jeromes 解决方案实际上是错误的,可能会产生不正确的结果!!

sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value

let's assume the following table让我们假设下表

conversions
id value
1 5
1 5
1 5
2 2
3 1

the correct sum of value for distinct ids would be 8. Jerome's formula produces:不同 id 的正确值总和为 8。 Jerome 的公式产生:

sum(conversions.value) = 18
count(distinct conversions.id) = 3
count(*) = 5
18*3/5 = 9.6 != 8

For an explanation of why you were seeing incorrect numbers, read this .有关您看到错误数字的原因的解释, 请阅读此内容

I think that Jerome has a handle on what is causing your error.我认为杰罗姆可以处理导致您错误的原因。 Bryson's query would work, though having that subquery in the SELECT could be inefficient. Bryson 的查询可以工作,尽管在 SELECT 中使用该子查询可能效率低下。

Use the following query:使用以下查询:

SELECT links.id
  , (
    SELECT COUNT(*)
    FROM stats
    WHERE links.id = stats.parent_id
  ) AS clicks
  , conversions.conversions
  , conversions.conversion_value
FROM links
LEFT JOIN (
  SELECT link_id
    , COUNT(id) AS conversions
    , SUM(conversions.value) AS conversion_value
  FROM conversions
  GROUP BY link_id
) AS conversions ON links.id = conversions.link_id
ORDER BY links.created DESC

I use a subquery to do this.我使用子查询来做到这一点。 It eliminates the problems with grouping.它消除了分组问题。 So the query would be something like:所以查询会是这样的:

SELECT COUNT(DISTINCT conversions.id)
...
     (SELECT SUM(conversions.value) FROM ....) AS Vals

How about something like this:像这样的东西怎么样:

select l.id, count(s.id) clicks, count(c.id) clicks, sum(c.value) conversion_value
from    (SELECT l.id id, l.created created,
               s.id clicks,  
               c.id conversions,  
               max(c.value) conversion_value                    
        FROM links l
        LEFT JOIN stats s ON l.id = s.parent_id
        LEFT JOIN conversions c ON l.id = c.link_id  
        GROUP BY l.id, l.created, s.id, c.id) t
order by t.created  

This will do the trick, just divide the sum with the count of conversation id which are duplicate.这可以解决问题,只需将总和除以重复的对话 id 的计数。

SELECT a.id,
       a.clicks,
       SUM(a.conversion_value/a.conversions) AS conversion_value,
       a.conversions
FROM (SELECT links.id, 
       COUNT(DISTINCT stats.id) AS clicks, 
       COUNT(conversions.id) AS conversions, 
       SUM(conversions.value) AS conversion_value 
      FROM links 
      LEFT OUTER JOIN stats ON links.id = stats.parent_id 
      LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
      GROUP BY conversions.id,links.id
      ORDER BY links.created DESC) AS a
GROUP BY a.id
Select sum(x.value) as conversion_value,count(x.clicks),count(x.conversions)
FROM
(SELECT links.id, 
       count(DISTINCT stats.id) as clicks, 
       count(DISTINCT conversions.id) as conversions,
       conversions.value,       
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY conversions.id) x
GROUP BY x.id 
ORDER BY x.created desc;

I believe this will give you the answer that you are looking for.我相信这会给你你正在寻找的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM