简体   繁体   English

子查询与联接计数同一表中的子表以及其他表中的子表时

[英]Sub queries vs joins when counting children from same table and occurrences in other table

CREATE TABLE [dbo].[Comment] 
(
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [UserId] [int] NOT NULL,
  [Comment] [nvarchar](1024) NOT NULL,
  [Created] [datetime] NOT NULL CONSTRAINT [DF_Comment_Created]  DEFAULT (getdate()),
  [ContentId] [int] NOT NULL,
  [ParentId] [int] NULL,
  CONSTRAINT [PK_Comment] PRIMARY KEY CLUSTERED 
  (
    [Id] ASC
  )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
  ) ON [PRIMARY]

CREATE TABLE [dbo].[CommentLike]
(
  [CommentId] [int] NOT NULL,
  [UserId] [int] NOT NULL,
  CONSTRAINT [PK_CommentLike] PRIMARY KEY CLUSTERED 
  (
    [CommentId] ASC,
    [UserId] ASC
  )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
  ) ON [PRIMARY]

CREATE TABLE [dbo].[User]
(
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [Username] [nvarchar](64) NOT NULL,
  [FirstName] [nvarchar](50) NOT NULL,
  [LastName] [nvarchar](50) NOT NULL,
  [Email] [nvarchar](255) NOT NULL,
CONSTRAINT [PK_User] PRIMARY KEY CLUSTERED 
(
  [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY =  OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

There is a non clustered index on Comment (ContentId, ParentId) . Comment (ContentId, ParentId)上有一个非聚集索引。

In short the Comment table contains comments and sub comments via Id and ParentId . 简而言之, Comment表通过IdParentId包含评论和子评论。 The CommentLike table contains likes (one like per user) for comments and sub comments. CommentLike表包含对评论和子评论的CommentLike (每个用户一个CommentLike )。

The Comment table contains about 8000 rows and the CommentLike table 1800. Comment表包含约8000行, CommentLike表1800。

I´ve created a query that lists comments (only top level comments), sub comment count, like count and also a value that indicates if supplied user likes each comment. 我创建了一个查询,其中列出了注释(仅顶级注释),子注释计数(如count)以及一个值,该值指示所提供的用户是否喜欢每个注释。 The whole query is filtered on ContentId in the where clause (simply a unique integer value that represents an id in another system) 整个查询在where子句中的ContentId进行过滤(简单地,一个唯一的整数值表示另一个系统中的id)

I have one version that uses sub queries and the other joins (on sub queries). 我有一个版本使用子查询,而另一个版本使用子查询。

Sub query version: 子查询版本:

select 
  c.Id,
  c.Comment,
  c.Created,
  c.ContentId,
  (select count(Id) from Comment where ParentId = c.Id) as SubComments,
  (select count(UserId) from CommentLike where CommentId = c.Id) as Likes,
  (select count(UserId) from CommentLike where CommentId = c.Id and UserId = @currentUserId) as CurrentUserIsLiking
from Comment c
where c.ContentId = @contentId and c.ParentId is null
group by
  c.Id, c.Comment, c.Created, c.ContentId

子查询版本

Join version: 加盟版本:

select
  c.Id,
  c.Comment,
  c.Created,
  c.ContentId,
  isnull(c2.SubComments, 0) as SubComments,
  isnull(cl.Likes, 0) as Likes,
  isnull(cl.CurrentUserIsLiking, 0) as CurrentUserIsLiking
from Comment c
left join
(
  select
    ParentId,
    count(Id) as SubComments
  from Comment 
  group by ParentId
) as c2
on c.Id = c2.ParentId
left join
(
  select
    CommentId,
    count(UserId) as Likes,
    count(case when UserId = @currentUserId then 1 else null end) as CurrentUserIsLiking
  from CommentLike 
  group by CommentId
) as cl
on c.Id = cl.CommentId
where c.ContentId = @contentId and c.ParentId is null
group by
  c.Id, c.Comment, c.Created, c.ContentId, 
  c2.SubComments, cl.Likes, cl.CurrentUserIsLiking

加盟版

On average both versions run below 600ms but the sub query version always seems to run about 20% faster than the join version. 两种版本的平均运行时间均低于600毫秒,但子查询版本的运行速度似乎总是比联接版本快20%。

The question: 问题:

No matter how many rows the tables contain the sub query version is always faster than the join version. 无论表包含多少行,子查询版本始终快于联接版本。 I've always thought that, performance wise, joins are better than sub queries, is that not true in this case? 我一直认为,就性能而言,联接比子查询要好,在这种情况下是不正确的? Since performance is important I wonder if there is any optimization that can be done to either of the versions to make that specific version outperform the other? 由于性能很重要,我想知道是否可以对两个版本进行优化以使该特定版本优于另一个版本?

Why are you aggregating in the outer query for the join version? 为什么要在外部查询中汇总join版本?

select c.*, 
       p.SubComments,
       coalesce(cl.Likes, 0) as Likes,
       coalesce(cl.CurrentUserIsLiking, 0) as CurrentUserIsLiking
from Comment c left join
     (select ParentId, count(*) as SubComments
      from Comment
      group by ParentId
     ) p
     on p.ParentId = c.Id
     (select CommentId, count(UserId) as Likes,
             sum(case when UserId = @currentUserId then 1 else 0
                 end) as CurrentUserIsLiking
     from CommentLike 
     group by CommentId
    ) cl
    on c.Id = cl.CommentId
where c.ContentId = @contentId and c.ParentId is null;

Because the outer query is limiting the comments, it is quite reasonable that the version using correlated subqueries would be faster. 由于外部查询限制了注释,因此使用相关子查询的版本会更快是非常合理的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM