简体   繁体   English

加入查询来计算记录数不正确

[英]Join query to count the number of records isn't counting right

I have a categories table that works sort of like the tags on StackOverflow. 我有一个类别表,有点像StackOverflow上的标签。 There are categories like JavaScript, which would be a top category, and maybe jQuery which would be a sub-category of JavaScript. 有类似JavaScript的类别,它们可能是顶级类别,也许jQuery可能是JavaScript的子类别。

What I am trying to query is how many records (in my application they are "problems") there are under each category. 我想要查询的是每个类别下有多少记录(在我的应用程序中它们是“问题”)。

Here is the SQL I have so far. 这是我到目前为止的SQL。 The problem_categories is just a joining table which has a problem_id and a category_id. problem_categories只是一个具有problem_id和category_id的连接表。

select problems.problem_id , categories.category_id , category_name , count(problems.problem_id) as num_problems , is_top
from problems 
    left join problem_categories on
    problems.problem_id = problem_categories.category_id
    left join categories on
    problem_categories.category_id = categories.category_id
    where is_top = 1;

This returns only one line. 这只返回一行。 What I was hoping for is to have returned the number of records where is_top = 1 (That would mean that it is a top category). 我希望的是返回is_top = 1的记录数(这意味着它是一个顶级类别)。

How could I change my query to do that? 我怎么能改变我的查询呢?

Thanks!! 谢谢!!

Without guessing at the logic, what's wrong is that you need a GROUP BY to count: 没有猜测逻辑,错误的是你需要GROUP BY来计算:

SELECT problems.problem_id, categories.category_id, category_name, 
COUNT(problems.problem_id) as num_problems, is_top
FROM problems 
LEFT JOIN problem_categories 
ON problems.problem_id = problem_categories.category_id
LEFT JOIN categories 
ON problem_categories.category_id = categories.category_id
WHERE is_top = 1;
GROUP BY  problems.problem_id, categories.category_id, category_name

But if you want the number of problems per top-category, your logic probably goes like this instead: 但是如果你想要每个顶级类别的问题数量,你的逻辑可能会这样:

SELECT category_name, categories.category_id, problems.problem_id, 
COUNT(problems.problem_id) as num_problems
FROM categories 
JOIN problem_categories 
ON problem_categories.category_id = categories.category_id
JOIN problems 
ON problems.problem_id = problem_categories.category_id
WHERE is_top = 1;
GROUP BY category_name, categories.category_id, problems.problem_id

Notice that: 请注意:

  • For each category, you get the problems, and you count those instead 对于每个类别,您都会遇到问题,而您会计算出这些问题
  • You use JOIN instead of LEFT JOIN, since you don't care about categories that have no problem(s) anyway. 你使用JOIN而不是LEFT JOIN,因为你不关心那些没有问题的类别。
  • You can leave the is_top out of the select, since you put it in your WHERE clause anyway. 您可以将is_top保留在select之外,因为无论如何都将它放在WHERE子句中。 Selecting something which is not in your GROUP BY will get any random value, but since all values are 1, you can safely do it, or just leave it. 选择不在GROUP BY中的内容将获得任意随机值,但由于所有值均为1,您可以安全地执行此操作,或者只是保留它。

"This returns only one line. What I was hoping for is to have returned the number of records where is_top = 1 (That would mean that it is a top category)." “这只返回一行。我希望的是返回is_top = 1的记录数(这意味着它是一个顶级类别)。” I suspect that that is what you are getting. 我怀疑这就是你得到的。

...count(problems.problem_id) as num_problems...
...where is_top = 1;

I think you need to introduce a GROUP BY clause on one or more of your fields 我认为您需要在一个或多个字段上引入GROUP BY子句

There are a few things that are amiss in your query: 您的查询中有一些不妥之处:

  1. As mentioned in other answers, the query is missing a GROUP BY clause 如其他答案中所述,查询缺少GROUP BY子句

  2. The first JOIN in your query is matching unrelated columns - "on problems.problem_id = problem_categories.category_id". 查询中的第一个JOIN是匹配不相关的列 - “on problems.problem_id = problem_categories.category_id”。 As you can see, it is joining problem_id with category_id 如您所见,它正在使用category_id加入problem_id

  3. Though this might not be the problem with the current data-set but to get counts per category, it makes more sense to keep the categories table on the extreme left in the LEFT JOIN 虽然这可能不是当前数据集的问题,但是为了获得每个类别的计数,将categories表保持在LEFT JOIN的最左侧更有意义。
  4. From optimization point of view, I do not think you need to put problems table in your query because there is nothing in that table that is needed in the output 从优化的角度来看,我认为您不需要在查询中放置problems表,因为输出中不需要该表中的任何内容

Please below another version of your query: 请在您的查询的另一个版本下面:

SELECT `categories`.`category_id`, `categories`.`category_name`, COUNT(`problem_categories`.`problem_id`) AS `num_problems`, `categories`.`is_top`
FROM `categories`
LEFT JOIN `problem_categories` ON `problem_categories`.`category_id` = `categories`.`category_id`
WHERE `categories`.`is_top` = 1
GROUP BY `categories`.`category_id`;

Hope the above helps! 希望以上有所帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM