为什么查询优化器在合并联接后使用排序？

Question

Consider this query: 考虑以下查询：

select
    map,line,pda,item,qty,qty_gift,pricelist,price,linevalue,vat,
    vat_value,disc_perc,disc_value,dt_disc_value,netvalue,imp_qty,
    imp_value,exp_qty,exp_value,price1,price2,price3,price4,
    justification,notes
  from appnameV2_Developer.dbo.pt
  where exists (select 1 from [dbo].[dt] dt 
                where pt.map=dt.map and dt.pda=pt.pda and dt.canceled=0)
except
select 
    map,line,pda,item,qty,qty_gift,pricelist,price,linevalue,vat,
    vat_value,disc_perc,disc_value,dt_disc_value,netvalue,imp_qty,
    imp_value,exp_qty,exp_value,price1,price2,price3,price4,
    justification,notes
  from appnameV2_Developer_reporting.dbo.pt

I made this to make sure there is no data difference in the same table (pt) between a replication publisher database(appnameV2_Developer) and its subscriber database(appnameV2_Developer_reporting). 我这样做是为了确保复制发布者数据库（appnameV2_Developer）和其订阅者数据库（appnameV2_Developer_reporting）在同一表（pt）中没有数据差异。 The specific replication article has a semijoin on dt. 特定的复制文章在dt上有一个半联接。

dt is a transaction header table with PK (map,pda) dt是带有PK（map，pda）的事务头表

pt is a transaction detail table with PK (map,pda,line) pt是带有PK（地图，pda，线）的交易明细表

Here's the execution plan 这是执行计划

So, we have a Right Semi Join merge join. 因此，我们有一个Right Semi Join合并合并。 I would expect its result to be ordered by (map,pda,line). 我希望它的结果按（map，pda，line）排序。 But then, a sort operator on (map,pda,line) is called. 但是然后，调用了（map，pda，line）上的排序运算符。

Why does this sort occur (or, more accurately: why is the data not already sorted by that point)? 为什么会发生这种排序（或更准确地说：为什么此时尚未对数据进行排序）？ Is the query optimizer lacking the logic of "when merge joining then its output is (still) sorted on the join predicates"? 查询优化器是否缺少“合并合并然后其输出（仍）按连接谓词排序”的逻辑？ Am I missing something? 我想念什么吗？

Answer 1

Because it decided to use a "Merge Join" to execute the EXCEPT clause. 因为它决定使用“合并联接”来执行EXCEPT子句。 In order to perform a Merge Join both datasets must have the same ordering. 为了执行合并联接，两个数据集必须具有相同的顺序。

The thing is, the inner Merge Join (before the EXCEPT) is based on the table dt , not on pt . 事实是，内部合并联接（在EXCEPT之前）基于表dt而不是pt 。 Therefore, the resulting rows won't have the same ordering as the other side of the EXCEPT , that is based on pt . 因此，结果行将与EXCEPT的另一端（基于pt排序不同。

Why does SQL Server do that? 为什么SQL Server会这样做？ Not clear. 不清楚。 I would have done it differently. 我会做不同的。 Maybe the stats are not updated. 也许统计数据没有更新。 Maybe there is small amount of rows where the strategy does not matter too much. 也许有少量行对策略没有太大影响。

Answer 2

The results from the first merge will be sorted by map, pda, line . 第一次合并的结果将按map, pda, line进行排序。 However, you yourself mentioned join predicates, and the join predicates for this first merge are only based on map, pda (they're the predicates from inside the exists clause, except the cancelled one has been pushed down to the index scan). 但是，您自己提到了连接谓词，并且此第一次合并的连接谓词仅基于map, pda （它们是exists子句中的谓词，除了已cancelled的谓词已下推到索引扫描之外）。 All that that first merge required was input sorted by map and pda , and so that's the only sort order guaranteed on that data, so far as the rest of the query is concerned. 所有首先需要合并的内容都是按map和pda排序的输入，因此，就查询的其余部分而言，这是保证该数据唯一的排序顺序。

But as we know, the outputs from this first merge were actually derived from input that was additionally sorted by line . 但是，正如我们所知，第一次合并的输出实际上是从输入中获得的，而输入又是按line排序的。 It appears the optimizer isn't currently able to spot this circumstance. 看来优化器当前无法发现这种情况。 It may be that the order of optimizations mean that it's unlikely ever to recognise this situation. 优化的顺序可能意味着永远不可能意识到这种情况。 So currently, it introduces the extra sort. 因此，目前，它引入了额外的排序。

为什么查询优化器在合并联接后使用排序？

问题描述

2 个解决方案

解决方案1
0 2018-10-11 14:13:43

解决方案2
0 2018-10-11 14:40:56

为什么查询优化器在合并联接后使用排序？

问题描述

2 个解决方案

解决方案1 0 2018-10-11 14:13:43

解决方案2 0 2018-10-11 14:40:56

解决方案1
0 2018-10-11 14:13:43

解决方案2
0 2018-10-11 14:40:56