简体   繁体   中英

Left joins --> aggregate function problem

I have four different tables in my database:

thread:

  • thread_id
  • thread_content
  • timestamp

thread_rating:

  • thread_rating_id
  • thread_id
  • liked
  • disliked

thread_report:

  • thread_report_id
  • thread_id

thread_impression:

  • thread_impression_id
  • thread_id

And I'm going to join on these tables with this SQL-Query

SELECT t.thread_id,
t.thread_content,
SUM(tra.liked) AS liked,
SUM(tra.disliked) AS disliked,
t.timestamp,
((100*(tra.liked + SUM(tra.liked))) / (tra.liked + SUM(tra.liked) + (tra.disliked + SUM(tra.disliked)))) AS liked_percent,
((100*(COUNT(DISTINCT tre.thread_report_id)) / ((COUNT(DISTINCT ti.thread_impression_id))))) AS reported_percent
FROM thread AS t
LEFT JOIN thread_rating AS tra ON t.thread_id = tra.thread_id
LEFT JOIN thread_report AS tre ON tra.thread_id = tre.thread_id
LEFT JOIN thread_impression AS ti ON tre.thread_id = ti.thread_id
GROUP BY t.thread_id
ORDER BY liked_percent

The Query should return all thread_ids with the calculated liked and disliked, the likes in percent, the timestamp, when the thread was inserted into the database and the reports in percent to the impressions (the times, the thread was shown to the user)...

Nearly all results are right, the only results which are not right are the likes and dislikes.

If I put a count(*) in front of the query, I can see, that the right results have a count of 1 and the wrong ones have sometimes a count of up to 60. Seems like there are cross join-problems...

I think that this is an issue with the Grouping, or perhaps I should embrace the Joins.

I've seen solutions with subselects. But I don't think that this is a great solutions for this issue...

What am I doing wrong here?

The tra table has multiple records per thread_id. This caused double counts in the SUM function.
Do the summations in a subselect, grouped by the join field.
That way you will only have one thread_id in tra2 to join with and duplicate rows will be avoided.

SELECT t.thread_id,
  t.thread_content,
  tra2.liked
  tra2.disliked,
  t.timestamp,
  tra2.liked_percent,
  ((100*(COUNT(DISTINCT tre.thread_report_id)) / ((COUNT(DISTINCT ti.thread_impression_id))))) AS reported_percent
FROM thread AS t
LEFT JOIN (
     SELECT 
       tra.thread_id 
       , SUM(tra.liked) AS liked
       , SUM(tra.disliked) AS disliked
       , ((100*(tra.liked + SUM(tra.liked))) / (tra.liked + SUM(tra.liked) + (tra.disliked + SUM(tra.disliked)))) AS liked_percent 
     FROM thread_rating AS tra
     GROUP BY tra.thread_id
) as tra2 ON t.thread_id = tra2.thread_id
LEFT JOIN thread_report AS tre ON tra.thread_id = tre.thread_id
LEFT JOIN thread_impression AS ti ON tre.thread_id = ti.thread_id
GROUP BY t.thread_id
ORDER BY liked_percent DESC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM