简体   繁体   English

SQL子查询还是INNER-JOIN?

[英]SQL Sub-query or INNER-JOIN?

I've the two following queries: 我有以下两个问题:

declare @UserId as int
set @UserId = 1

-- Query #1: Sub-query
SELECT
    u.[Id] ,
    u.[Name] ,
    u.[OrgId] AS Organization,
    (SELECT o.[Name] FROM Org o WHERE o.Id = u.OrgId) As OrganizationName,
    [UserRoleId] AS UserRole,
    [UserCode] AS UserCode,
    [EmailAddress] As EmailAddress, 
    (SELECT SearchExpression FROM SearchCriteria WHERE UserId = @UserId AND IsDefault=1 ) AS SearchCriteria,
    (SELECT PageSize FROM UserPreferences WHERE UserId = @UserId) AS UserPreferencePageSize,
    (SELECT DrilldownPageSize FROM UserPreferences WHERE UserId = @UserId) AS UserPreferenceDrilldownPageSize
    FROM [User] as u
WHERE u.Id = @UserId

-- Query #2: LEFT OUTER JOIN-query
SELECT
    u.[Id] ,
    u.[Name] ,
    u.[OrgId] AS Organization,
    (SELECT o.[Name] FROM Org o WHERE o.Id = u.OrgId) As OrganizationName,
    [UserRoleId] AS UserRole,
    [UserCode] AS UserCode,
    [EmailAddress] As EmailAddress, 
    sc.SearchExpression As SearchExpression,
    up.PageSize As PageSize,
    up.DrilldownPageSize As DrilldownPageSize    
    FROM [User] as u
LEFT OUTER JOIN [UserPreferences] as up ON u.id = up.UserId
LEFT OUTER JOIN [SearchCriteria] as sc ON u.id = sc.UserId
    WHERE ISNULL(sc.IsDefault,1)=1 AND u.Id = @UserId

Query execution plan statistics: (Query cost relative to batch) 查询执行计划统计信息:(相对于批处理的查询成本)

  • Query#1 (Sub-Query) : 56% 查询#1(子查询):56%
  • Query#2 (JOIN) : 44% 查询#2(加入):44%

I thot the sub-query would be optimal because the sub-query will be executed after the WHERE filter is applied. 我thot子查询将是最佳的,因为子查询将在应用WHERE过滤器后执行。 The statistics say the Query#2 - JOIN approach is better. 统计数据表明查询#2 - JOIN方法更好。

Pls suggest. 请建议。 Also as a moderate SQL-Server user how can I derive which query is better (anything other then execution-plan, if it is more helpful) 同样作为一个温和的SQL-Server用户,我如何能够更好地推导出哪个查询(除了执行计划之外的任何其他内容,如果它更有用)

Thank you. 谢谢。

join is faster than subquery. join比子查询更快。

subquery makes for busy disk access, think of hard disk's read-write needle(head?) that goes back and forth when it access: User, SearchExpression, PageSize, DrilldownPageSize, User, SearchExpression, PageSize, DrilldownPageSize, User... and so on. 子查询使繁忙的磁盘访问,想到硬盘的读写指针(head?) ,它在访问时来回传递:User,SearchExpression,PageSize,DrilldownPageSize,User,SearchExpression,PageSize,DrilldownPageSize,User ...等等上。

join works by concentrating the operation on the result of the first two tables, any subsequent joins would concentrate joining on the in-memory(or cached to disk) result of the first joined tables, and so on. join通过将操作集中在前两个表的结果上,任何后续连接都会集中连接到第一个连接表的内存(或缓存到磁盘)结果,依此类推。 less read-write needle movement, thus faster 较少的读写针运动,因此更快

The best thing you can do is try both and compare what gives you the best performance. 你可以做的最好的事情是尝试两者并比较什么给你最好的表现。 It's difficult to second guess what the query optimiser will do (you could write 2 different queries that actually end up being optimised to the same execution plan). 很难再次猜测查询优化器将执行什么操作(您可以编写2个不同的查询,这些查询实际上最终会针对同一个执行计划进行优化)。

To compare performance fairly, you should ensure you try them from a level playing field by clearing down the execution plan and data cache before trying each one. 为了公平地比较性能,您应该确保通过在尝试每个执行计划和数据缓存之前清除执行计划和数据缓存从级别竞争领域尝试它们。 This can be done using the following commands, though only do this on a development/test db server: 这可以使用以下命令完成,但只能在开发/测试数据库服务器上执行此操作:

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

The approach I usually take is to run each query 3 times, with SQL Profiler running so I can monitor the duration, reads, CPU and writes of the query which I then base my decision on. 我通常采用的方法是运行每个查询3次,运行SQL事件探查器,这样我就可以监视查询的持续时间,读取,CPU和写入,然后我根据这些信息做出判断。

eg 例如
1) clear cache using above commands 1)使用上述命令清除缓存
2) run query and record stats 2)运行查询和记录统计信息
3) clear cache 3)清除缓存
4) run query again 4)再次运行查询
5) run query again (this will use cached execution plan/data) 5)再次运行查询(这将使用缓存的执行计划/数据)

Then repeat for the second query to compare. 然后重复第二个查询进行比较。

It will depend largely on the cardinality of your data: if your in-line lookups are minimal compared to the overhead of join ing vast amounts of data (when you only need to extract a small subsection from that join result), then the inline option will be quicker. 它在很大程度上取决于数据的基数:如果你的内联查找与join大量数据的开销相比是最小的(当你只需要从该连接结果中提取一个小的子部分时),那么内联选项会更快。 But if you are having substantial overhead with in-line selects (ie if your result has a large number of rows, and you are calling out an inline select for each and every row), then the join will be quicker. 但是如果你在内联选择中有很大的开销(即如果你的结果有很多行,而你正在为每一行调用一个内联选择),那么连接会更快。

I can't see from yourquestion the numbers involved (ie how many rows) so it's hard to make qualitative comment. 我无法从你的问题中看到所涉及的数字(即多少行),因此很难做出定性评论。

For example, if your result set has 10 rows, then the inline selects will be carried out for each of those ten rows only, whereas the join might be involving far more rows, which are then selectively reduced by WHERE clauses. 例如,如果结果集有10行,则仅对这10行中的每一行执行内联选择,而连接可能涉及更多行,然后由WHERE子句选择性地减少。 But if you have a resultset of 10 million rows, the inline selects will most probably kill performance, since its row-by-row. 但是如果你有一个1000万行的结果集,那么内联选择很可能会破坏性能,因为它是逐行的。

EXAMPLE : imagine you have to collect a load of bricks (specified by size etc.) from all over a building yard and paint them blue. 例子 :想象一下你必须从整个建筑场地收集一堆砖(由大小等指定)并将它们涂成蓝色。

inline select = Selecting all the bricks you need and then painting them by hand. 内联选择 =选择所需的所有砖块,然后手工绘制。

join = dump all the bricks into a huge bucket of paint, anf then choose the ones you need join =将所有砖块倒入一大桶油漆中,然后选择你需要的油漆

If you only want to end up with 10 bricks, it is far quicker to select and then paint by hand. 如果你只想要10块砖,那么选择然后手工绘制要快得多。 If you want a million bricks, then mass-painting them in a tub first is the way to go. 如果你想要一百万块砖,那么首先在浴缸中进行大规模涂漆是可行的方法。

The relative cost of an execution plan isn't always a reliable indicator of performance. 执行计划的相对成本并不总是可靠的绩效指标。

I assume from your sql that only 1 row should be returned. 我假设你的SQL只返回1行。 Providing that the UserId is a unique key on User, then the performance of your 2 approaches will be similar on most relational databases. 假设UserId是User上的唯一键,那么在大多数关系数据库中,您的2种方法的性能将类似。

Things to bear in mind would be: 要记住的事情是:

  • if UserPreferences or SearchCriteria return more than 1 row, the first approach will raise an sql error, the second approach will return more than 1 row. 如果UserPreferences或SearchCriteria返回超过1行,第一种方法将引发sql错误,第二种方法将返回多于1行。
  • the apparent extra lookup in the first approach (UserPreferences selected twice) has no real effect because for the second lookup the record will already be in a buffer 第一种方法中明显的额外查找(UserPreferences选择两次)没有实际效果,因为对于第二次查找,记录已经在缓冲区中
  • if for some reason the User table is tablespace scanned, the first approach will be much faster 如果由于某种原因User表被扫描表空间,第一种方法会快得多

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM