简体   繁体   English

为什么这个简单的SQL查询不起作用?

[英]Why doesn't this simple SQL query work?

So I'm working on SQL Server 2008 and I have this query which should be quite simple, but for some reason doesn't work. 所以我正在研究SQL Server 2008,我有这个查询应该很简单,但由于某种原因不起作用。 It basically looks like that: 它看起来基本上是这样的:

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    AVG(q.Score)        AS "Average Question Rating",  
    COUNT(q.Id)         AS "N. of Questions posted by the agent",
    AVG(a.Score)        AS "Average Answer Rating",  
    COUNT(a.Id)         AS "N. of Answers posted by the agent"
FROM    
        Users u, 
        Answers a, 
        Questions q
WHERE q.OwnerUserId = u.Id
AND a.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate

When I only work on either the Answers table or the Questions table, everything is ok. 当我只在Answers表或Questions表上工作时,一切正常。 But as soon as I try to do both at once (like in the query above), the COUNTs don't work at all. 但是,一旦我尝试同时执行这两项操作(如上面的查询),COUNT就完全不起作用了。 What I get is that the COUNT(a.Id) is identical to the COUNT(q.Id). 我得到的是COUNT(a.Id)与COUNT(q.Id)相同。 So I tried reducing my query to see what was wrong, and I realized that I just had to add the Questions or the Answers table (even without using them anywhere) to the FROM clause when working with the other table and everything was ruined. 所以我尝试减少查询以查看错误,并且我意识到我只需要在使用其他表时将问题或答案表(即使不使用它们)添加到FROM子句中,一切都被破坏了。

I'm sure it's something ridiculously trivial that I have overlooked but it's driving me crazy, I'd be thankful if anybody could point me what went wrong. 我确定这是一件非常微不足道的事情,我忽略了它但却让我发疯,如果有人能指出我出了什么问题,我会感激不尽。 Thank you in advance. 先感谢您。

You're not joining Answers and Questions correctly for the aggregation. 您没有正确加入AnswersQuestions以进行汇总。 Between Answers and Questions , the result is a cartesian product (for every user, every answer is coupled with every question) AnswersQuestions ,结果是笛卡尔积(对于每个用户,每个答案都与每个问题相结合)

The simplest way to correct this is to perform aggregation in subqueries: 解决此问题的最简单方法是在子查询中执行聚合:

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    ISNULL((SELECT AVG(Score) FROM Answers   WHERE OwnerUserId = u.Id), 0)
                        AS "Average Question Rating",  
           (SELECT COUNT(*)   FROM Answers   WHERE OwnerUserId = u.Id)        
                        AS "N. of Questions posted by the agent",
    ISNULL((SELECT AVG(Score) FROM Questions WHERE OwnerUserId = u.Id), 0)
                        AS "Average Answer Rating",  
           (SELECT COUNT(*)   FROM Questions WHERE OwnerUserId = u.Id)
                        AS "N. of Answers posted by the agent"
FROM  Users u

Alternatively using joins: 或者使用连接:

SELECT TOP 10
     u.Id                AS "UserId",
     u.CreationDate      AS "Member since",
     ISNULL(q.a, 0)      AS "Average Question Rating",  
     ISNULL(q.c, 0)      AS "N. of Questions posted by the agent",
     ISNULL(a.a, 0)      AS "Average Answer Rating",  
     ISNULL(a.c, 0)      AS "N. of Answers posted by the agent"
FROM Users u
-- If you LEFT JOIN these tables, you'll get also results for users without
-- questions or answers
LEFT OUTER JOIN (SELECT OwnerUserId, AVG(Score) a, COUNT(*) c 
     FROM Questions GROUP BY OwnerUserId) q
     ON  q.OwnerUserId = u.Id
LEFT OUTER JOIN (SELECT OwnerUserId, AVG(Score) a, COUNT(*) c 
     FROM Answers GROUP BY OwnerUserId) a
     ON  a.OwnerUserId = u.Id

I don't know SQL Server's query optimiser well enough, so I can't say which one is going to be faster. 我不太了解SQL Server的查询优化器,所以我不能说哪一个会更快。 The first solution could take advantage of scalar subquery caching, if that is available in SQL Server. 第一个解决方案可以利用标量子查询缓存(如果在SQL Server中可用)。 Otherwise, the second query maybe performs less nested loops. 否则,第二个查询可能执行较少的嵌套循环。

As noted elsewhere, your join on user ID on both questions and answers essentially produces a cartesian join at the user level between the two tables. 如其他地方所述,您在问题和答案上加入用户ID实际上会在两个表之间的用户级别生成笛卡尔联接。 A better approach would be to use a union: 更好的方法是使用联合:

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    AVG(q_score)        AS "Average Question Rating",  
    COUNT(q_id)         AS "N. of Questions posted by the agent",
    AVG(a_score)        AS "Average Answer Rating",  
    COUNT(a_id)         AS "N. of Answers posted by the agent"
FROM Users u
JOIN (select OwnerUserId,
             Score        q_score,
             Id           q_id,
             NULL         a_score,
             NULL         a_id
      from Answers
      union all
      select OwnerUserId,
             NULL         q_score,
             NULL         q_id,
             Score        a_score,
             Id           a_id
      from Questions) qa
  ON qa.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate

Wouldn't just counting the DISTINCT Ids work? 是不是只计算DISTINCT ID工作?

SELECT TOP 10 
    u.Id                         AS "UserId", 
    u.CreationDate               AS "Member since", 
    AVG(q.Score)                 AS "Average Question Rating",   
    COUNT(DISTINCT q.Id)         AS "N. of Questions posted by the agent", 
    AVG(a.Score)                 AS "Average Answer Rating",   
    COUNT(DISTINCT a.Id)          AS "N. of Answers posted by the agent" 
FROM     
        Users u,  
        Answers a,  
        Questions q 
WHERE q.OwnerUserId = u.Id 
AND a.OwnerUserId = u.Id 
GROUP BY u.Id, u.CreationDate 

If it were me I would do explicit joins on those other tables (answers and questions). 如果是我,我会明确加入其他表(答案和问题)。 how is it linking the other tables if you don't do a join? 如果你不加入,它如何链接其他表?

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    AVG(q.Score)        AS "Average Question Rating",  
    COUNT(q.Id)         AS "N. of Questions posted by the agent",
    AVG(a.Score)        AS "Average Answer Rating",  
    COUNT(a.Id)         AS "N. of Answers posted by the agent"
FROM    
        Users u, 
        Answers a, 
        Questions q
WHERE q.OwnerUserId = u.Id
AND a.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate

would be

SELECT TOP 10
    u.Id                AS "UserId",
    u.CreationDate      AS "Member since",
    AVG(q.Score)        AS "Average Question Rating",  
    COUNT(q.Id)         AS "N. of Questions posted by the agent",
    AVG(a.Score)        AS "Average Answer Rating",  
    COUNT(a.Id)         AS "N. of Answers posted by the agent"
FROM    
        Users u
JOIN Answers a on u.ID = a.ID (assuming thats how answers and users are linked).
JOIN Questions q on a.ID = q.ID (assuming thats how questions and answers are linked)
WHERE q.OwnerUserId = u.Id
AND a.OwnerUserId = u.Id
GROUP BY u.Id, u.CreationDate

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM