简体   繁体   English

子查询如何引用它之外的表?

[英]How can a sub-query refer to a table outside it?

I am trying to understand how a sub-query within a JOIN can refer to a field in the upper query.我试图了解 JOIN 中的子查询如何引用上层查询中的字段。

The vehicles table stores the current information on the vehicles used in a company;车辆表存储了公司使用的车辆的当前信息; all the vehicle history is stored in a table named vehicles_aud whose structure is completely identical to the vehicles table but also includes a reference to another table, called revisions, which stores the info about who, when, why, etc. made a change to the main table.所有车辆历史都存储在一个名为 vehicle_aud 的表中,其结构与车辆表完全相同,但还包括对另一个表的引用,称为修订,该表存储有关谁、何时、为什么等对车辆进行更改的信息主表。

To get the very last action done to the vehicle, a very simple JOIN like this is used:为了完成对车辆的最后一个操作,使用了一个非常简单的 JOIN,如下所示:

SELECT *
FROM vehicles v
    JOIN vehicles_aud vu ON vu.id=v.id AND vu.revision_id=(
        SELECT max(revision_id)
        from vehicles_aud
        WHERE id=v.id
    )
    JOIN revisions r ON r.id=vu.revision_id

Please don't mind the asterisk in the SELECT section: I'm sure specifying any real fields here doesn't make much sense for my question below.请不要介意 SELECT 部分中的星号:我确定在此处指定任何实际字段对于我下面的问题没有多大意义。 To be precise, this query can also be re-written the following way for better understanding:准确地说,这个查询也可以通过以下方式重写以便更好地理解:

SELECT *
FROM vehicles v
    CROSS APPLY (
        SELECT TOP 1 *
        FROM vehicles_aud
        WHERE id=v.id
        ORDER BY id DESC
    ) vu
    JOIN revisions r ON r.id=vu.revision_id

In the second example, JOIN is not applicable.在第二个示例中,JOIN 不适用。

I assume the sub-query in the first example should be used with the CROSS APPLY operator because it refers to the id field in the vehicles table outside the sub-query, but IRL the query with the JOIN like above works well.我假设第一个示例中的子查询应该与 CROSS APPLY 运算符一起使用,因为它指的是子查询之外的车辆表中的 id 字段,但是 IRL 使用上述 JOIN 的查询效果很好。 And I doubt how that could be ever possible without CROSS APPLY?我怀疑如果没有 CROSS APPLY,这怎么可能? How, I mean, in what cases and what circumstances can a sub-query refer to fields of a table outside the sub-query?我的意思是,在什么情况下子查询可以引用子查询之外的表的字段?

Using analytic functions is one way to go here:使用分析函数是一种方法:

SELECT TOP 1 WITH TIES *
FROM vehicles v
INNER JOIN vehicles_aud vu ON vu.id = v.id
INNER JOIN revisions r ON r.id = vu.revision_id
ORDER BY ROW_NUMBER() OVER (PARTITION BY v.id ORDER BY vu.revision_id DESC);

The above query will return all records having the maximum revision_id value per group of records sharing the same vehicles.id value.上述查询将返回每组共享相同vehicles.id revision_id值的具有最大revision_id值的所有记录。

I'm not sure, if this will really answer your question...我不确定,这是否真的能回答你的问题......

In short: Any kind of JOIN will create two result sets and match them with the given condition, while any kind of APPLY will call the operation row-by-row .简而言之:任何一种 JOIN 都会创建两个结果集并将它们与给定的条件匹配,而任何一种 APPLY 都会逐行调用操作。 If the APPLY returns more than one row, a result set is added (similar to a JOIN), while with single row results the engine simply adds the columns.如果 APPLY 返回多行,则添加结果集(类似于 JOIN),而对于单行结果,引擎仅添加列。

The reality will be much more complicated.实际情况会复杂得多。

The engine is very smart and will decide for the best plan after checking statistics, indexes, existing plans and so on.该引擎非常智能,会在检查统计信息、索引、现有计划等后决定最佳计划。 It is very likely, that the real plan you get is not what you'd expect.您得到的真正计划很可能不是您所期望的。 And it is fairly likely that the plan you get might be the same for seemingly different queries.对于看似不同的查询,您获得的计划很可能是相同的。

Try the following with "include actual plans" switched on:在打开“包括实际计划”的情况下尝试以下操作:

USE master;
GO
CREATE DATABASE testPlan;
GO
USE testPlan;
GO

CREATE TABLE t1 (ID INT IDENTITY CONSTRAINT pk PRIMARY KEY, SomeValue VARCHAR(100));
INSERT INTO t1 VALUES('MaxVal will be 100'),('MaxVal will be 200'),('MaxVal will be 300');
GO

CREATE TABLE t2(fkID INT CONSTRAINT fk FOREIGN KEY REFERENCES t1(ID),TheValue INT);
INSERT INTO t2 VALUES(1,1),(1,2),(1,100)
                    ,(2,1),(2,2),(2,200)
                    ,(3,1),(3,2),(3,300);
GO

--a scalar computation using MAX()
SELECT *
      ,(SELECT MAX(t2.TheValue) FROM t2 WHERE t1.ID=t2.fkID) AS MaxVal
FROM t1

--the same as above, but with APPLY
SELECT *
FROM t1
CROSS APPLY(SELECT MAX(t2.TheValue) FROM t2 WHERE t1.ID=t2.fkID) A(MaxVal)

--Now we pick the TOP 1 after an ORDER BY
SELECT *
      ,(SELECT TOP 1 t2.TheValue FROM t2 WHERE t1.ID=t2.fkID ORDER BY t2.TheValue DESC) AS MaxVal
FROM t1

--and again the same with APPLY
SELECT *
FROM t1
CROSS APPLY(SELECT TOP 1 t2.TheValue FROM t2 WHERE t1.ID=t2.fkID ORDER BY t2.TheValue DESC) A(MaxVal)

--Tim's approach using the very slick TOP 1 WITH TIES approach
SELECT TOP 1 WITH TIES *
FROM t1 INNER JOIN t2 ON t1.ID=t2.fkID
ORDER BY ROW_NUMBER() OVER(PARTITION BY t1.ID ORDER BY t2.TheValue DESC);

GO
USE master;
GO
--carefull with real data!
--DROP DATABASE testPlan;
GO

The plan for the "scalar MAX" uses a table scan on 27(!) rows, reduced to 9. The same approach with APPLY has the same plan. “标量 MAX”的计划在 27(!) 行上使用表扫描,减少到 9。与 APPLY 相同的方法具有相同的计划。 The engine is smart enough to see, that this will not need a fully blown result set.引擎足够聪明,可以看到这不需要完全成熟的结果集。 As a side note: You can use MaxVal like a variable in the query, very helpfull...附带说明:您可以将 MaxVal 用作查询中的变量,非常有帮助...

The plan with TOP 1 in a sub-query is the most expensive in this tiny test.在这个小测试中,子查询中TOP 1的计划是最昂贵的。 It starts with the same as above (table scan with 27 rows, reduced to 9), but has to add a sort operation.开头和上面一样(表扫描27行,减少到9行),但是要加一个排序操作。 The variation with APPLY is roughly the same. APPLY 的变化大致相同。

The approach with TOP 1 WITH TIES takes 9 rows of t2 and sorts them. TOP 1 WITH TIES需要 9 行 t2 并对它们进行排序。 The following operation is done against 9 rows.以下操作是针对 9 行进行的。 One more sort and the reduction to the TOP rows.再进行一次排序并减少到 TOP 行。

In this case the first is the fastest - by far.在这种情况下,第一个是最快的 - 到目前为止。

But in (your) reality the actual behavior will depend on existing indexes, statistics and the actual row counts.但是在(您的)现实中,实际行为将取决于现有索引、统计信息和实际行数。 Furthermore you have one additional level (one more table) in between.此外,您还有一个额外的级别(多一张桌子)。 The more complex a query is, the harder it will be for the optimizer to find the best plan.查询越复杂,优化器就越难找到最佳计划。

Conclusion结论

If performance matters, then race your horses and do the measurements.如果性能很重要,那么就与您的马匹赛跑并进行测量。 If performance is not so important take the query which is easier to read, understand and maintain.如果性能不是那么重要,请使用更易于阅读、理解和维护的查询。

This is your first query:这是您的第一个查询:

SELECT *
FROM vehicles v JOIN
     vehicles_aud va
     ON va.id = v.id AND
        va.revision_id = (SELECT MAX(va2.revision_id)
                          FROM vehicles_aud va2
                          WHERE va2.id = v.id
--------------------------------^
                         ) JOIN
     revisions r
     ON r.id = va.revision_id;

I assume your question is about this clause.我想你的问题是关于这个条款的。 This is a correlation clause in a correlated subquery .这是在相关子查询相关性子句 The use of table aliases clarifies what is happening.表别名的使用阐明了正在发生的事情。

Logically, what is happening is that for each row in the outer query, the inner query is run with a separate value for va.id As you seem to know, it pulls the most recent value of revision_id .从逻辑上讲,发生的情况是,对于外部查询中的每一行,内部查询使用va.id的单独值运行。正如您似乎知道的那样,它提取了revision_id最新值。

Some people have an unnatural bias against correlated subqueries, thinking that the database actually cycles through all the rows.有些人对相关子查询有一种不自然的偏见,认为数据库实际上循环遍历所有行。 Remember, SQL is a descriptive language.请记住,SQL 是一种描述性语言。 Although that describes what the processing is doing, that is not what actually happens in general.尽管这描述了处理正在做什么,但这并不是一般实际发生的情况。 In particular, correlated subqueries can be the most efficient mechanism under some circumstances.特别是,在某些情况下,相关子查询可能是有效的机制。

A more "colloquial" way to write the query would use window functions:编写查询的更“口语化”的方法是使用窗口函数:

SELECT *
FROM vehicles v JOIN
     (SELECT va.*,
             ROW_NUMBER() OVER (PARTITION BY va.id ORDER BY va2.revision_id DESC) as seqnum
      FROM vehicles_aud va
     ) va
     ON va.id = v.id AND
        va.seqnum = 1 JOIN
     revisions r
     ON r.id = va.revision_id;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM