简体   繁体   English

左连接 - >聚合函数问题

[英]Left joins --> aggregate function problem

I have four different tables in my database: 我的数据库中有四个不同的表:

thread: 线:

  • thread_id thread_id单
  • thread_content thread_content
  • timestamp 时间戳

thread_rating: thread_rating:

  • thread_rating_id thread_rating_id
  • thread_id thread_id单
  • liked 喜欢
  • disliked 不喜欢的

thread_report: thread_report:

  • thread_report_id thread_report_id
  • thread_id thread_id单

thread_impression: thread_impression:

  • thread_impression_id thread_impression_id
  • thread_id thread_id单

And I'm going to join on these tables with this SQL-Query 我将使用此SQL-Query加入这些表

SELECT t.thread_id,
t.thread_content,
SUM(tra.liked) AS liked,
SUM(tra.disliked) AS disliked,
t.timestamp,
((100*(tra.liked + SUM(tra.liked))) / (tra.liked + SUM(tra.liked) + (tra.disliked + SUM(tra.disliked)))) AS liked_percent,
((100*(COUNT(DISTINCT tre.thread_report_id)) / ((COUNT(DISTINCT ti.thread_impression_id))))) AS reported_percent
FROM thread AS t
LEFT JOIN thread_rating AS tra ON t.thread_id = tra.thread_id
LEFT JOIN thread_report AS tre ON tra.thread_id = tre.thread_id
LEFT JOIN thread_impression AS ti ON tre.thread_id = ti.thread_id
GROUP BY t.thread_id
ORDER BY liked_percent

The Query should return all thread_ids with the calculated liked and disliked, the likes in percent, the timestamp, when the thread was inserted into the database and the reports in percent to the impressions (the times, the thread was shown to the user)... Query应该返回所有thread_id,其中包含计算出的喜欢和不喜欢的内容,以百分比表示的喜欢,时间戳,线程插入数据库的时间以及报告的百分比(展示次数,时间,线程都显示给用户)。 ..

Nearly all results are right, the only results which are not right are the likes and dislikes. 几乎所有结果都是正确的,唯一不合适的结果是喜欢和不喜欢。

If I put a count(*) in front of the query, I can see, that the right results have a count of 1 and the wrong ones have sometimes a count of up to 60. Seems like there are cross join-problems... 如果我在查询前面加上一个count(*),我可以看到,正确的结果计数为1,错误的结果有时计数最多为60.看起来有交叉连接问题.. 。

I think that this is an issue with the Grouping, or perhaps I should embrace the Joins. 我认为这是分组的问题,或者我应该接受联接。

I've seen solutions with subselects. 我见过带有子选择的解决方案。 But I don't think that this is a great solutions for this issue... 但我认为这不是解决这个问题的好方法......

What am I doing wrong here? 我在这做错了什么?

The tra table has multiple records per thread_id. tra表每个thread_id有多个记录。 This caused double counts in the SUM function. 这导致SUM函数中的双重计数。
Do the summations in a subselect, grouped by the join field. 在子选择中进行求和,按连接字段分组。
That way you will only have one thread_id in tra2 to join with and duplicate rows will be avoided. 这样,你只需要在tra2有一个thread_id加入,并避免重复行。

SELECT t.thread_id,
  t.thread_content,
  tra2.liked
  tra2.disliked,
  t.timestamp,
  tra2.liked_percent,
  ((100*(COUNT(DISTINCT tre.thread_report_id)) / ((COUNT(DISTINCT ti.thread_impression_id))))) AS reported_percent
FROM thread AS t
LEFT JOIN (
     SELECT 
       tra.thread_id 
       , SUM(tra.liked) AS liked
       , SUM(tra.disliked) AS disliked
       , ((100*(tra.liked + SUM(tra.liked))) / (tra.liked + SUM(tra.liked) + (tra.disliked + SUM(tra.disliked)))) AS liked_percent 
     FROM thread_rating AS tra
     GROUP BY tra.thread_id
) as tra2 ON t.thread_id = tra2.thread_id
LEFT JOIN thread_report AS tre ON tra.thread_id = tre.thread_id
LEFT JOIN thread_impression AS ti ON tre.thread_id = ti.thread_id
GROUP BY t.thread_id
ORDER BY liked_percent DESC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM